About this Function()
The provided Custom Excel Formula script is a mix of web scraping script and a json fetch procedure that fetches PAN (Permanent Account Number) data from the website of Inland Revenue Department of Nepal (https://ird.gov.np/pan-search). It does the following:
- References necessary libraries: requests for making HTTP requests and MSXML2.ServerXMLHTTP.6.0 for parsing HTML.
- Sends a GET request to the user GUI URL (https://ird.gov.np/pan-search) to obtain a CSRF token, which is used for authentication.
- Parses the HTML response to extract the CSRF token.
- Defines an API URL (api_url) where the script will send POST requests to fetch PAN data.
- Defines constants like the maximum number of retry attempts (max_retries).
- Defines a function fetch_pan_data to fetch PAN data for a given PAN number using a POST request to the API. It retries the request if the JSON response is 0.
Okay Drake, Explain it:

Here is how this is implemented in Excel
The VBA code of the custom Function defines two functions, GetPanInfo and ConvertUnicodeEscape, for interacting with a website and processing JSON responses containing PAN (Permanent Account Number) information.
The GetPanInfo() function is a VBA function designed for retrieving PAN (Permanent Account Number) information from a website. It takes three parameters: pan (the PAN number to query), field (an optional argument to specify which PAN information to retrieve, with a default of “trade_Name_Eng”), and maxRetries (an optional argument to set the maximum retry attempts, defaulting to 3). The function utilizes the MSXML2.ServerXMLHTTP.6.0 object to send HTTP requests, starting with a GET request to obtain a CSRF token from a specified URL. After obtaining the CSRF token, it enters a retry loop for sending a POST request to the API with the PAN number and CSRF token. If the POST request is successful, it checks if the JSON response is “0”; if so, it retries up to the specified maximum retries. If the retry limit is reached, it returns an appropriate message. Depending on the field parameter, the function extracts and processes specific fields from the JSON response, such as address or tax clearance. Additionally, it calls the ConvertUnicodeEscape function to handle Unicode escape sequences in the JSON response before returning the result.
Here is what you can achieve using this tool:

Steps to Install in Excel
Step 1: Download the Excel Addin File from this link.
Step 2: Install the Addin File
(i) Open your “Microsoft Excel”
(ii) Go to “Files:
(ii) Go to “Options”
(iv) Go to “Add-ins” tab in the new window
(v) At the bottom at “Manage” Section choose “Excel Add-ins” and click “Go”
(vi) Click “Browse” and browse and select the file downloaded from Step 1
(vii) Click “OK” and return to Excel Spreadsheet
Step 3: You can start to use the tool
Step 4: The Function
=GetPanInfo(pan, field[optional], retries[optional])
PAN: Select the cell with the PAN number or input a string.
field: This is optional. If you want details other than the PAN Name then you can use this argument. This field supports the following arguments: eff_Reg_Date, filing_Period, account_Status, trade_Name_Nep, trade_Name_Eng, address, office_Name, tax_Clearance
retries: The maximum number of POST json request attempts if the IRD database doesn’t respond instantly.
For those who love Python and Multithreading
The provided Python script demonstrates Python’s strength in web scraping and parallel processing. It fetches PAN data from a government website efficiently, thanks to libraries like requests and BeautifulSoup. Utilizing multithreading, it concurrently processes multiple PAN numbers, boosting retrieval speed. The script’s retry mechanism highlights Python’s adaptability in handling complex scenarios, while data sorting and CSV file management showcase Python’s data manipulation capabilities. Its flexibility allows adjustments for different use cases, and it recognizes the importance of error handling in production environments. Additionally, it emphasizes ethical web scraping practices and permissions, making it a comprehensive example of Python’s web scraping prowess.
##### COURTESY OF CRITICAL SPAGHETTI #####
import requests
from bs4 import BeautifulSoup
import csv
import threading
# Define the user GUI URL
user_gui_url = "https://ird.gov.np/pan-search"
# Create a session to maintain cookies and session data
session = requests.Session()
# Send a GET request to the user GUI URL to obtain a CSRF token
response = session.get(user_gui_url)
# Parse the HTML response to extract the CSRF token
soup = BeautifulSoup(response.text, "html.parser")
csrf_token = soup.find("input", {"name": "_token"})["value"]
# Define the API URL
api_url = "https://ird.gov.np/statstics/getPanSearch"
# Maximum number of retry attempts
max_retries = 5
# Define a lock to ensure thread-safe writing to the CSV file
csv_lock = threading.Lock()
# Function to fetch PAN data for a given PAN number
def fetch_pan_data(pan_number, data_list):
# Define the form data with the current PAN number and the obtained CSRF token
form_data = {
"_token": csrf_token,
"captcha": "4", # No adjustment needed here, IRD's CAPTCHA implementation doesn't use CAPTCHA referencing at server side, You can leave it at any number
"pan": str(pan_number) # Convert the PAN number to a string
}
# Initialize a counter for retries
retry_count = 0
# Send a POST request to the API using the session with the obtained CSRF token
while retry_count < max_retries:
api_response = session.post(api_url, json=form_data)
# Check if the request was successful (status code 200)
if api_response.status_code == 200:
json_data = api_response.json()
# Check if the JSON data is 0 (retry condition)
if json_data == 0:
print(f"JSON data is 0 for PAN {pan_number}, retrying (attempt {retry_count + 1})...")
retry_count += 1
else:
# JSON data is not 0, add the response data to the list
print(f"Received response for PAN {pan_number}")
data_list.append({"PAN Number": form_data["pan"], "JSON Response": json_data})
break
else:
print(f"Request to API for PAN {pan_number} failed with status code:", api_response.status_code)
break # Exit the loop if the status code is not 200
if retry_count == max_retries:
print(f"Reached maximum retry attempts for PAN {pan_number}, exiting.")
# Create a list to store thread objects
threads = []
# Create a list to store the data
data_list = []
# Loop through PAN numbers from 100000000 to 100000025 and start a thread for each
for pan_number in range(100000000, 100000026):
thread = threading.Thread(target=fetch_pan_data, args=(pan_number, data_list))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
# Sort the data list by PAN number
sorted_data = sorted(data_list, key=lambda x: x["PAN Number"])
# Define the CSV filename
csv_filename = "pan_responses_sorted.csv"
# Write the sorted data to the CSV file
with open(csv_filename, mode='w', newline='', encoding='utf-8') as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["PAN Number", "JSON Response"])
for row in sorted_data:
csv_writer.writerow([row["PAN Number"], row["JSON Response"]])
print("CSV file sorted by PAN number has been created:", csv_filename)
Error “This file type is not supported in protected view”.
This probably is an IT security issue. Try enabling macro access from trust center settings in Excel and also go to the properties of the downloaded file and check the unblock radio key at the bottom of the properties window.
Worked Well but slower for large data volumes. Can it be done for company registrar data ?
Yes, the implementation in Excel is kinda slow. If this is commercial purpose, i have other alternatives as well – but there is compensation tied to it. OCR also should be dooable. I haven’t researched but it can be done theoretically.
Brilliant work! Can you guide me through this function with Tally? I have written some codes, but these codes are not working. If you want to help me, please reply to gsu143@gmail.com.