Add Library to Project
We are going to use the below given CDN in our project for optical character recognition. This is the only CDN that I found on the internet that works for the browser. You can use this CDN until you find the latest one.
Show Image on Upload
This is an optional part of the application, only to display the uploaded image which is about to be scanned for character recognition. For that, we use onchange function on file selector that executes only when a file is selected. Get the file from selector button. Convert the file into the blob using Blob() function and then convert it to an object URL to create a URL for that file. Now set the URL to the src of the img tag and image will be displayed.
Text Recognition Process
This process will start when a user clicks on a button. Before recognition, clear the textarea for the output. Use TesseractWorker() class and store it in a variable. Now use recognize() function as a method of a previously defined Class and provide selected file to that function as a parameter.
This class also has a progress method that we use to get the current progress of the whole recognition process. This will show the progress of different packages that install for that process. This will also return the progress of text recognition from 0 to 1. We are using "if condition" to show the progress of text recognition only. For package installation, we will just show the status.
In then() function, We have complete data as a response. This has recognized text, words and separate characters, along with a confidence level and much more. You can study the returned object for something you want to use.
Optical character recognition is an amazing project for beginners to build portfolios and learn something amazing as a web developer. Tesseract.js provides an easy way to do recognition in the browser with ease.