PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
Updated
May 28, 2024 - Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
CCExtractor - Official version maintained by the core team
fastapi server for classification of documents and extraction of data
Web scraper for extracting data from online newspapers
Tesseract based OCR for android
Android document document scanning app
Tesseract Open Source OCR Engine (main repository)
Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.
⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images.
6 MB Tesseract (with English training data) to fit inside AWS Lambda
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
Build "Dictionary of the Old Danish Language" into easier-to-use data formats
The open-sourced version of the award-winning Qiqqa research management tool for Windows (a bleeding edge dev fork) ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ☞☞☞ File any issues you find in the main repo issue tracker at https://github.com/jimmejardine/qiqqa-open-source/issues
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Personal Assistant built using python libraries. It does almost anything which includes sending emails, Optical Text Recognition, Dynamic News Reporting at any time with API integration, Todo list generator, Opens any website with just a voice command, Plays Music, Wikipedia searching, Dictionary with Intelligent Sensing i.e. auto spell checking…
Add a description, image, and links to the tesseract topic page so that developers can more easily learn about it.
To associate your repository with the tesseract topic, visit your repo's landing page and select "manage topics."