got data?

For some amount of time (ie., until the bandwidth costs add up 🙂 ) you can download the figures extracted from arxiv documents, during the development and testing of the search API described in prior posts. If you have the AWS CLI installed, getting the figure meta data (from which you can create download URLs) is as simple as:

aws s3api list-objects --bucket textextract1 > foo.txt

after which you should have a large (> 20M) json array, with entries like the following:

{
"LastModified": "2018-06-07T06:45:09.000Z",
"ETag": "\"5834e3c3714674f480a02a93265a6a79\"",
"StorageClass": "STANDARD",
"Key": "aebff15fc0edf12bc500be3f9233369da564c50e38b83d643621f956e2f44cae/quant-ph0010117-000016_qc_out_0.png",
"Size": 35081
}

An example figure image from this dataset (~ 67k images) is attached.