Building on last month’s post, for a fixed amount of time, I’m making various API methods discussed on our website freely available. For example, you can GET arxiv figure data using the following curl, where of course the query string must correspond to data in our database:
curl -k ‘https://plot2txt-staging.us-east-1.elasticbeanstalk.com/record?type=figure&source=astro&label=temp’
Here we’re requesting figures from arxiv PDFs with ‘astro’ in the source url, and one or more labels must contain the substring ‘temp’. To keep costs down, responses are limited to 5 db items per query, but you can iterate through the db by using the last high resolution timestamp in a response as the starting key for the next query ala:
curl -k ‘https://plot2txt-staging.us-east-1.elasticbeanstalk.com/record?type=figure&source=astro&label=temp&skey=1526848905565’
Besides the source arxiv URL eg., “https://arxiv.org/abs/astro-ph/0007095”, you will receive the figure with the detected substring, as well as a fit to the pixels using mixture modeling. Where possible, numerical data from the axes is used to invert the pixel positions. Last month I effectively gave a method for bulk download of figure images, you could of course individually curl URLs given in the response to the queries above.
Let us know if you’re interested in a paid tier/high throughput REST API for the construction of a figure or table search engine from your own PDF document collection.