API Books Crawler: OCR + LATEX for mathematics, physics and chemistry books

1
vote
1 answer
I scanned my entire library and have all the books in pdf format. Most of them can't be found on the internet. In addition to books on history, art, history and fiction, I also have books on medicine, mathematics, physics, chemistry, etc.

I need to scan my PDF books, to make a database, then starting ChatGPT for getting answers. I have to use API, to crawl all books.

The BIG problem: right now, to complete this task, and to get good results, I'm using Mathpix Snip (`mathpix.com`) , to scann each PDF file with OCR and LATEX, then save each file in txt. Mathpix Snip is the best tool than can convert mathematics, physics and chemistry books into latex, and save as txt files. After that, I connect txt database with API from OpenAI. Simple, but...

**The problem: it takes a lot of time !**

So, I need all these procedures to be done much faster, at once: scanning pdfs, converting them to txt files, to make a unique database of 500 books, so that I can get start using ChatGPT focus only to my database.

So, I need a plugin to scann, convert with OCR+LATEX, using API from OpenAi, and crawling all books to make a database, then start ChatGPT prompt for answers. Can be possible to create such a plugin for AI ?
Mar 7, 2024
May be you can try HelloRAG.ai. It is going to be released late March.
Mar 7, 2024
thanks, I bookmark that page. It will be free?
Post

Help

+ D bookmark this site for future reference
+ ↑/↓ go to top/bottom
+ ←/→ sort chronologically/alphabetically
↑↓←→ navigation
Enter open selected entry in new tab
⇧ + Enter open selected entry in new tab
⇧ + ↑/↓ expand/collapse list
/ focus search
Esc remove focus from search
A-Z go to letter (when A-Z sorting is enabled)
+ submit an entry
? toggle help menu
0 AIs selected
Clear selection
#
Name
Task