Twitter API Drill Down Analysis

Social media constitutes a major part of our day-today life we share our ideas, dreams and most importantly views using these mediums, these concerns can be extremely helpful for brands that wish to better understand their customers base and thus propose new offers or products powered by customer insight.

In this rather technical article we will explore how we can extract information out of one of the major social media platforms “Twitter” using its developers API.

Initial Notes

  • We have only included end-points that can be used in the context of social data mining or analysis, we are therefore not interested in ads, posting or modifying users timelines.
  • There are 3 main types of paid plans the differences mainly comes in the type of content they can access and managed options rather than the rate limits, we will delve in details about their pricing and accessibility option for each individual service, but over all these plans are
    • Standard: are the free option they can be used for testing an integration, or validating a concept, they can even be stretched to accommodate some simple research services, However, they might not be suitable for a production-level product with large enough number of users.
    • Enterprise: offer the highest level of access and reliability to those who depend on Twitter data. They feature among other things Enterprise-grade APIs, tailored packages and annual contracts, and dedicated account managers and technical support. And such plans are usually priced on a case per case situation.
    • Premium: This plan strikes a balance between the standard and enterprise case, they are especially suitable for startups with a limited number of users, this plan offers scalable access to increased data, free sandbox (ability to test the app on a free tier) and flexible month-to-month contracts, forum access. Even in this plan, there is no explicit price and the overall monthly cost is calculated based on requests rate.
  • There are 2 services not covered by this document these are the metrics service and the batch-tweets service both of which have no documentation and only available for Enterprises as a user-tailored service on contact.

Authentication types

There are 2 main types of authentication in Twitter API, user authenticated and app authenticated, there are big differences between the 2 types in both the

  • A user authenticated request is a request that is done on behalf of a twitter @user this means that in order to authenticate this type of requests you need to have a real user/bot with e-mail and all.
  • An app authenticated request is a request done on behalf of a twitter app, An app is always owned by a single user. The app provides the base context for using the Twitter API, including the consumer and access tokens.

The rate limits differ between the 2 methods sometimes greatly, However, each of the 2 methods has different buckets, i.e. assuming a user account X is used to send both user-authenticated and app-authenticated requests to a single endpoint the rate limit to this end-point becomes user-rate limit + app -rate limit.

One trivial loophole to get higher rate limits is to have a single user with multiple apps, in that case, all of these apps can call the same end-point using multiple apps-keys and thus effectively double your rate limit. This method fails for 2 reasons:

  • Based on twitter app policy applications that “identical, similar, or substantially overlapping use cases, or for the purpose of circumventing Twitter’s API usage limits, is not permitted. ” even more “Offering the same service to multiple end-users using multiple API applications is not permitted. For example, if your service allows brands to monitor and respond to mentions, you should use a single application for all your end-users (who each authenticate with your app using OAuth). Each authenticated user is subject to per-user and per-app API usage limits as applicable.”, This policy has a single exception of having a dev staging and production environments. as far as we know twitter does not have a formal app review process when creating a new app. However, if an already created app interferes with these policies it can be suspended.
  • Furthermore, twitter enforces a limit of 10 apps per user account, it is possible to extend the number of apps beyond that but this will require the apps to comply with aforementioned policies.

Endpoints Overview

In the following table, we explore the details of end-points that we believe are usable by a social media analysis app. Here are a few points to keep in mind before you start reading the details:

  • The rate limits can differ between the user and app authentication schemes However both rate limits are calculated based on a fixed time window after which these requests quota restarts, for example, if a certain end-point have a 15 mins request window and a rate limit of 900 this means you can perform 900 requests every 15 mins i.e. 48k requests per day
  • The rate limits of a certain workflow is limited by the lowest rate limits of its steps, for example, let’s say you want to search the latest tweets and then find certain aspects of them a search service can have a rate limit of say 50 RPW (request per window) this end-point will return 100 tweets ids, but in order to get the details of the tweets you will need to call the statuses/show endpoint
  • To get the sense of these end-point let us consider a simple example of finding people who tweeted badly on a certain brand, the workflow might look something like the following:
    • Use search service to retrieve tweets ids related to a query term (usually the brand name or aliases)
    • Get the details of the tweet using the tweets/list service based on the retrieved ids and then process them to get the sentiment
    • Use the author id from the tweet details to harness more information about the author like interests, likes, …
End-point Description Example request and response Rate limit user authentication Rate limit app authentication Request window availability Notes
GET statuses/show/:id Returns a single Tweet object, specified by the id parameter. The Tweet’s author will also be embedded within the Tweet. This end-point can be used to extract the details of a single tweet here 900 900 15 mins All plans While the tweet object do include a geo field this field is only filed in the free subscriptions (standard, premium sandbox) if the author manually adds a geo-location, However it is possible to get a near exact geo-location in the paid subscriptions by using the enrichment services
GET statuses/lookup Returns up to 100 fully detailed tweet objects per request where the ids of these tweets ids are specified in the request body, this can be seen as the batch version of the previous end-point here 900 300 15 mins All plans You must be following a protected user to be able to see their most recent Tweets. If you don’t follow a protected user their status will be removed. The order of Tweet IDs may not match the order of Tweets in the returned array. If a requested Tweet is unknown or deleted, then that Tweet will not be returned in the results list
GET statuses/retweets/:id Given a tweet id returns the tweets objects of the latest 100 retweets of that tweet here 75 300 15 mins All plans
GET statuses/retweeters/ids Returns a collection of up to 100 user IDs belonging to users who have retweeted the Tweet specified by the id parameter. here 75 300 15 mins All plans This end-point is extremely similar to the previous endpoint and indeed you can extract the user ids from the previous end-point, the only viable use case of this one is to save rate limits when only the user information are needed for example to extract demographics.
GET favorites/list Returns the 20 most recent Tweets liked by the authenticating or specified user. here 75 75 15 mins All Plans It is possible to specify the user id in this endpoint, this can help us in better identifying the users that say (liked, disliked, …) mainly be reviewing their likes and thus interests, it can also be helpful in identifying bots and legit users
Search API Returns a collection of relevant Tweets matching a specified query. here Based on plan Based on plan Based on plan The service is available for all plans However there are major differences between them see the search APIs section for details Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface. And even if they are there might be a time gap between tweet publish time and tweet indexing time. It is possible to filter the results based on {query, geo-location, language, publish date, tweet id} It is possible to select the ordering of the results from 3 options: popular : return only the most popular results in the response. recent : return only the most recent results in the response mixed : Include both popular and real time results in the response. It is possible to use specific query rules and regexs to filter the results based on them the availability of these regexs depends on the account pricing plan.
Decahose stream The Decahose delivers a 10% random sample of the realtime Twitter Firehose through a streaming connection. This is accomplished via a realtime sampling algorithm which randomly selects the data, while still allowing for the expected low-latency delivery of data as it is sent through the firehose by Twitter. here Na Na Na Enterprise This service can be used to find trends/events and similar emerging use-cases A free version of this end-point is available with much limited capabilities
Power Track API This is a streaming API it allows the app to read a stream of tweets that confine to certain rules. The PowerTrack API provides customers with the ability to filter the full Twitter firehose, and only receive the data that they or their customers are interested in. This is accomplished by applying the PowerTrack filtering language to match Tweets based on a wide variety of attributes, including user attributes, geo-location, language, and many others. Using PowerTrack rules to filter Tweet ensures that customers receive all of the data, and only the data they need for your app. here Na Na Na Enterprise This is the ideal API to be used for real customers since it allows full tracking of tweets related to a customer
Standard Streaming API This is the free version of the powerTrack API it can as well read the streams of tweets but with limited features see the section below on streaming services here See the section below See the section below See the section below ALL

Search service

There are 3 plans for search service based on account type (standard, premium, and enterprise), the main difference is based on the amount of data this search can retrieve

Standard Search

  • The Twitter’s standard search API (search/tweets) allows simple queries against the indices of recent or popular Tweets and behaves similarly to, but not exactly like the Search UI feature available in Twitter mobile or web clients.
  • The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.
  • Can only retrieve 100 tweet per-query, it is possible to expand this number by varying the query term or filtering/ordering scheme but this will not yield a full period search access
  • Only standard search operators to better filtering of results based on the query are available for this plan

Premium Search

  • There are two premium search products based on the API ( Search Tweets: 30-day endpoint which provides Tweets from the previous 30 days, Search Tweets: Full-archive endpoint which provides complete and instant access to Tweets dating all the way back to the first Tweet in March 2006.) these endpoints provide low-latency, full-fidelity, query-based access to the Tweet archive with one minute average granularity.
  • Nearly all of the premium search operators for query-based filtering are available
  • There are two endpoints associated with each premium search product.
    • The data endpoint is available to both sandbox and paid users, and can be used to return full tweet payloads of those tweets which match a query.
    • The counts endpoint is just available to paid users and can be used to return the data volume associated with a query
  • The limit on returned tweets per-query is 500 and can search the whole history through pagination as long as the rate limit is not preached
  • The premium search API supports two tiers of access:
    • Free Sandbox access that enables initial testing and development, this tier provides access to full-fidelity Tweet data, but has a lower set of limits and capabilities than Premium access.
    • Paid Premium access that provides increased access.

Below is a table that summaries the differences in the Sandbox and Premium tiers, enrichment are additional features provided by twitter including un-shortening urls and getting their meta-data, near exact geo-location even if the author does not specify a geo-location in her tweet

Feature type Sandbox Premium
Time frame Last 30 days or full-archive Last 30 days or full-archive
Tweets per data request 100 500
Tweet Counts endpoint No Yes
Query length 30-Day – 256 chars
Full Archive – 128 chars
1024 chars
Operator availability standard Premium
Rate limit 30 RPM, 10 RPS 60 RPM, 10 RPS
Enrichments n/a Expanded URLs, Profile Geo, Polls

The pricing of the API is based on the monthly request limit, see the following table

type #Monthly requests price Monthly tweets cap
Sandbox Up to 50 Free 5k
paid Up to 100 $99.00 2.5M
paid Up to 250 $224.00 2.5M
paid Up to 500 $399.00 2.5M
paid Up to 1000 $774.00 2.5M
paid Up to 2,500 $1,899.00 2.5M

Enterprise Search

This service has the same features as that of the premium search, However, it comes with increased monthly limit and other features like an improvement to the search regexes and enrichment features. The Enterprise account limits can reach up to 10k requests per month based on an agreement

The following table sums the differences between the 3 search services

Category Product name Supported history Query capability Counts endpoint Data fidelity Rate limits
Standard Standard API 7 days Standard Operators Not available Incomplete User auth 180 request per 15 mins app/auth 450 requests per 15 mins
Premium 30-days endpoint 30 days Premium operators Available Full Sandbox: 30 RPM, 10 RPS full: 60 RPM, 10 RPS
Premium Full-archive endpoint Tweets from as early as 2006 Premium operators Available Full Sandbox: 30 RPM, 10 RPS full: 60 RPM, 10 RPS
Enterprise 30-days endpoint 30 days Premium operators Included Full Based on Agreement
Enterprise Full-archive endpoint Tweets from as early as 2006 Premium operators Included Full Based on Agreement

Streaming Service

  • Streaming services are the most suitable tool for tracking brands and analyzing their social media engagement,
  • In this variant instead of querying the API for a specific set of tweets the streaming services open a connection through which an endless stream of tweets can be accessed, this stream can be restricted based on several filters including, user ids, keywords, …
  • There are only 2 versions of this services, standard and Enterprise (PowerTrack API) in this service these plans differ not in their rate limit but in the ability to restrict the feed, the standard option is fairly strong and you should only consider transitioning to enterprise if your users base is really large, However apart from query tool the major draw-back of the free streaming API is that it does not guarantee the completeness of the stream, users can expect to receive anywhere from 1% of the tweets to over 40%, However PowerTrack can gurantee 100% of tweets the following table sums the difference between the 2 end-points
API Category Number of filters Filtering operators Rule management
statuses/filter Standard 400 keywords, 5,000 userids and 25 location boxes Standard operators One filter rule on one allowed connection, disconnection required to adjust rule
PowerTrack Enterprise Up to 250,000 filters per stream, up to 2,048 characters each Premium operators Thousands of rules on a single connection, no disconnection needed to add/remove rules using Rules API

Conclusion

  • The statuses/filter provides the initial tool to collect tweets and process them in near-real-time, it has no restrictions on size, and can hopefully be useful for initial showcases
  • In case more deep features are needed to profile users or topics, then the search API can be used followed by status/lookup API

Do you know that we use all this and other AI technologies in our app?Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play: https://play.google.com/store/apps/details?id=io.almeta.almetanewsapp&hl=ar_AR

Leave a Reply

Your email address will not be published. Required fields are marked *