After a very busy couple of months, including a move to Silicon Valley (!), I’m pleased to say that plot2txt now offers a few more API methods, intended to help with search as much as data mining.
The new methods allow the user to automatically create a searchable collection of figures and tables, identified and extracted from papers uploaded to a gateway endpoint.
The gateway separates an input PDF into pages, these pages are then analyzed for figure and table content, using AWS lambda functions. Detected text and reverse engineered figure data are then stored in database tables, ready for search. Database entries also point to images of discovered figures and tables, stored in the cloud.