This one is a teaser. This has to be built in some better server and can be served to huge userbase. Its a tesseract OCR built on an open platform. This is basically the final nail in the coffin of troubles that is the “Legacy/Unicode Devanagari Text”.
This basically scans your image or pdf and uses open source OCR recognition from Tesseract. Tesseract is an Open Source OCR Engine. This then utilizes the library data and training data from Tesseract to scan your image/pdf or even handwritten texts to produce unicode text results. It doesn’t matter if you are starting with legacy texts like “Preeti” rather than unicode. This works equally well.
Feel free to utilize this in any way you seem fit as I do not own this. : )
Courtesy of Open Source OCR Engine – Tesseract :
GitHub – tesseract-ocr/tesseract: Tesseract Open Source OCR Engine