Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Creating a Data Set

MikeJohnPage edited this page May 3, 2019 · 16 revisions

Original Strategy

The original strategy was to use Twitter's Premium API to build a large data set (i.e., millions of observations). This is because this project was granted access to Twitter's Premium Sandbox API. Under closer inspection of the documents, it appears that only a maximum of 5000 tweets per month from the Search Tweets: Full-archive endpoint (i.e., Tweets since 2006) will be retrievable. This is because the Premium Sandbox API is limited to making 50 API requests per month to this endpoint, with each request returning a maximum of 100 tweets.

When making both data and count requests it is likely that there is more data than can be returned in a single response. When that is the case the response will include a 'next' token. The 'next' token is provided as a root-level JSON attribute. Whenever a 'next' token is provided, there is additional data to retrieve so you will need to keep making API requests.

One means to overcome this is to upgrade to the premium API. To return 1.25 million tweets would cost approximately €1700. Certainly not feasible at this stage without funding.

Current Strategy

The current strategy is to use the standard 7-day search Twitter API to retrieve tweets matched by keyword and known user accounts. The max rate limt is 450 requests per 15 min window. The script could be automated to run continuously using something like CRON.

Clone this wiki locally