You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
"tesseract.js": "^5.1.0", Describe the bug
Running createWorker with tgl language results in error.
The language data used by Tesseract.js by default is stored in this repo. Default language data is not something we actively manage/edit, but rather we inherit the default language data from the main Tesseract project.
Looking in this repo, it looks like no tgl (Tagalog) data exists for the LSTM model (the default). Therefore, your options for recognizing it are the following.
You could use the Legacy model (oem value 0), which does support this language.
This can be done by editing to the following: await createWorker(["eng", "tgl"], 0)
You can search online to see if anybody has produced an LSTM Tagalog .traineddata file, or train one yourself, and then use that.
You can make Tesseract.js use custom language data by setting the langPath argument
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
"tesseract.js": "^5.1.0",
Describe the bug
Running createWorker with tgl language results in error.
Uncaught Error: Error: Network error while fetching https://cdn.jsdelivr.net/npm/@tesseract.js-data/TGL/4.0.0_best_int/TGL.traineddata.gz. Response code: 404
at createWorker.js:247:1
at worker.onmessage (onMessage.js:3:1)
To Reproduce
await createWorker(["eng", "TGL"]);
Expected behavior
TGL language can be used
Device Version:
Windows 11
Chrome , Node 18.15
The text was updated successfully, but these errors were encountered: