Figure Search Engine II

For the time being, the figure search engine pipeline and API methods are freely available for testing.

With the key, you can upload PDFs for processing using the backend lambda functions that use machine learning. The particular method below will identify figures in PDFs, detecting text, inverting data from pixel positions and populate a database, making figures searchable and data potentially reusable. For example, you could test with the PDF available for this arXiv article. To upload:

curl --request POST -H "Authorization: Token my_token" -H "Accept: application/pdf" -H "Content-Type: application/pdf" -H "Filename: test.pdf" --data-binary "@0001311.pdf"

where 0001311.pdf is the name of the file to upload, and the Filename header field is the preferred name for storage in the database. After uploading one or more PDFs, you can then search on the x and y label names, as well as the filename field, and limit the search based on x and y data ranges. For example, referring again to the previous document:

curl --request GET -H "Authorization: Token my_token" -H "Accept: application/json" -H "Content-Type: application/json"

Here we’ve asked for all figures with x and y labels that contain the substring ‘erg’, and in this case we receive the following (extract):


Again this is just an extract, but note that figures and data are returned in the response eg., this image of the detected input figure found on page 4 of the input PDF:

You can also narrow your search using a query string eg., here we limit results to those where the minimum x value exceeds 6100, and the filename contains the phrase ‘test’:

curl --request GET -H "Authorization: Token my_token" -H "Accept: application/json" -H "Content-Type: application/json"\&xmin=6100\&source=test