Google Lens on Steroids: With FTP and Multithreading

Necessity meets resourcefulness !! Let’s unveil a relatively good alternative to Google’s Cloud Vision API – one that leverages freely available tools, such as FTP, Multithreading Python code, and services generously provided by Google to public. This fusion offers a cost-effective solution, affectionately known as the “poor man’s alternative,” that empowers users to harness the capabilities of Google Lens without breaking the bank, assuming you have one to break, I for one, don’t. While Google’s Cloud Vision API undoubtedly offers a wealth of image processing capabilities, it comes at a cost. This alternative, where we employ freely available tools and services to achieve similar results without incurring hefty expenses.

Prerequisites

FTP: The Gateway to Image Hosting

Our journey commences with the use of FTP, a universally accessible and cost-free protocol for transferring files. With FTP, we can bridge the gap between local storage and remote servers, allowing users to securely upload images to a hosted repository. This makes image sharing, storage, and retrieval seamless and budget-friendly. For this purpose I am using my own shared hosting storage for hosting the images that need to be digitized and FTP access that are provided by the web hosting services.

Multithreading Python Code: Maximizing Efficiency

To further amplify our resourcefulness, we employ Multithreading Python code. This elegant solution enables us to process multiple tasks concurrently, significantly enhancing efficiency. By optimizing the usage of available resources, we ensure that OCR tasks are completed swiftly and cost-effectively. In our case we will run multiple instances of google chrome to exponentially speed up the process of digitizing using the free version of google lens services provided by Google.

Google Lens: The Unstoppable OCR Engine

At the heart of our alternative lies Google Lens, a versatile and free tool provided by Google. Leveraging this formidable OCR engine, we achieve remarkable accuracy in extracting text from images. The integration of Google Lens into our cost-effective solution allows us to maintain high standards of quality without the added expense. Here is the url that allows to “lens” the online images: https://lens.google.com/uploadbyurl?url=<your_url>

The Symphony

And once we have these prerequisites we can orchestrate the integration of these free tools and services, we witness some resourcefulness unfold. This solution offers a balance between cost savings and performance excellence. This “poor man’s alternative” to the Cloud Vision API is a testament to “if it ain’t broke don’t fix it” – this just works.

Understanding the Code

How it works

This Python script uploads images to an FTP server and then performs Optical Character Recognition (OCR) on these images using Google Lens through Selenium. The OCR results are saved to text files with the same name as the images.

It imports necessary Python libraries and sets up locks and semaphores for thread synchronization.
The upload_image_to_ftp function connects to an FTP server, uploads an image from a local path, generates a URL for the uploaded image, and prints the URL.
The process_image function uses a new instance of Google Chrome (controlled by Selenium) to perform OCR on an image. It opens the Google Lens URL, clicks on the text, selects the text, and then extracts the OCR result, which is printed to the console. It also saves the OCR result to a text file.
FTP server credentials and the local image folder path are provided.
The script lists all files in the specified image folder with certain file extensions (e.g., .jpg, .jpeg, .png, .gif, .bmp), uploads each image to the FTP server, and creates a separate thread for processing each image using the process_image function.
The script then waits for all threads to finish processing before exiting.

Things to keep in mind

Ensure that you have the necessary Python libraries installed, including ftplib, os, threading, and selenium. Additionally, you’ll need the Chrome WebDriver for Selenium, which should be installed and accessible in your system’s PATH.
You may need to adjust the number of concurrent Chrome instances (chrome_semaphore) based on your system’s capabilities and resource constraints. In my case, I am using my humble Ryzen 5, for which I limited the chrome processes to 20 – and that utilized by CPU almost completely.
The script has taken Google Lens URLs and XPATHs are accessible as shown in the code. However, I fail to maintain this in the future, Google’s website structure may change over time, so ensure that the XPaths and elements you are interacting with remain valid.
Make sure you have the necessary permissions and access to the FTP server with the provided credentials.
The script assumes that your image files are located in the specified local folder and that they have the supported file extensions.

The Code Itself: with necessary comments

            #### COURTESY OF CRITICAL SPAGHETTI ####

from ftplib import FTP
import os
import threading
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Lock to synchronize access to the file system
file_lock = threading.Lock()

# Semaphore to limit the number of concurrent Chrome instances
chrome_semaphore = threading.Semaphore(20)  # Adjust the number as needed (this is the limit to the maximum chrome sessions that runs parallely)

# Function to upload an image to an FTP server and get the URL
def upload_image_to_ftp(image_path, ftp_host, ftp_username, ftp_password, ftp_upload_dir):
    # Connect to the FTP server
    ftp = FTP(ftp_host)
    ftp.login(user=ftp_username, passwd=ftp_password)
    # Change to the upload directory
    ftp.cwd(ftp_upload_dir)
    # Open the local image file for binary read
    with open(image_path, 'rb') as file:
        # Upload the image to the server
        ftp.storbinary(f'STOR {os.path.basename(image_path)}', file)
    # Generate the URL of the uploaded image
    uploaded_url = f'https://{ftp_upload_dir}/{os.path.basename(image_path)}'
    # Print the URL to the console
    print(f'Uploaded Image URL: {uploaded_url}')
    # Close the FTP connection
    ftp.quit()
    return uploaded_url

# Function to perform OCR on an image using a new instance of Chrome and save the result to a text file
def process_image(image_url, image_path):
    with chrome_semaphore:
        # Set up Chrome 
        driver_options = Options()
        # driver_options.add_argument("--headless")  # Run Chrome in headless mode
        driver = webdriver.Chrome(options=driver_options)

        try:
            driver.get("https://lens.google.com/uploadbyurl?url=" + str(image_url))
            # Click on text
            driver.find_element(By.XPATH, "/html/body/c-wiz/div/div[2]/div/c-wiz/div/div[1]/div/div[3]/div/div/span[2]").click()
            # Wait for "select all text to load"
            wait = WebDriverWait(driver, 10)
            wait.until(EC.visibility_of_element_located((By.XPATH, '/html/body/c-wiz/div/div[2]/div/c-wiz/div/div[2]/c-wiz/div/div/div/div[2]/div[1]/div/div/div/div[2]/div/button/span')))
            # Click on "select all text"                
            driver.find_element(By.XPATH, "/html/body/c-wiz/div/div[2]/div/c-wiz/div/div[2]/c-wiz/div/div/div/div[2]/div[1]/div/div/div/div[2]").click()
            # Wait for "copy text element"
            wait = WebDriverWait(driver, 10)
            wait.until(EC.visibility_of_element_located((By.XPATH, '/html/body/c-wiz/div/div[2]/div/c-wiz/div/div[2]/c-wiz/div/div/div/div[2]/div[1]/div/div/div[1]/div/div/div/div/div[1]/span/span/button/span[1]')))
            # Obtaining the result text
            result_text = driver.find_element(By.XPATH, "/html/body/c-wiz/div/div[2]/div/c-wiz/div/div[2]/c-wiz/div/div/span/div/div[2]").text
            print(result_text)

            # Save the result to a text file with the same name as the image, using a lock to prevent conflicts
            txt_filename = os.path.splitext(image_path)[0] + '.txt'
            with file_lock:
                with open(txt_filename, 'w', encoding='utf-8') as txt_file:
                    txt_file.write(result_text)
            print(f"Saved OCR result to {txt_filename}")
        finally:
            # Close the Chrome instance
            driver.quit()

# Provide the FTP server credentials and image folder location
ftp_host = 'eg.vox.sushilparajuli.com'
ftp_username = 'eg.sushilpa'
ftp_password = 'eg.passwordgoeshere'
ftp_upload_dir = 'eg.vox.sushilparajuli.com/lens'  # Replace with the desired upload directory
image_folder = r'C:\Users\SushilParajuli\Desktop\ftp\image'  # Replace with the path to your image folder

# List to store threads
threads = []

# List all files in the image folder and upload each one
for filename in os.listdir(image_folder):
    if filename.endswith(('.jpg', '.jpeg', '.png', '.gif', '.bmp')):
        image_path = os.path.join(image_folder, filename)
        uploaded_url = upload_image_to_ftp(image_path, ftp_host, ftp_username, ftp_password, ftp_upload_dir)
        # Create a thread for processing each image
        thread = threading.Thread(target=process_image, args=(uploaded_url, image_path))
        threads.append(thread)
        thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Archives

Categories

Meta