-verbose
Vicky's Blog

This post covers my initial adventures in exploring Twitter’s Streaming API. I’ll talk about:

  1. Getting started making a Twitter Streaming program using a template
  2. Problems arising from programming in Python with Windows 10 and Unicode
  3. Refining the search function

You can follow along with my template on Github.

1. Getting started

If you’re new to Twitter’s Streaming API like I was this morning, here it is from the top.

Twitter’s Streaming API basically enables you to continually load a stream of tweets based on search parameters of your choosing, as they’re created in real time. Using a little Python script run from the Windows 10 command line, you can have these tweets display (print) as they’re retrieved, while your script is running. The stream will continually update until you stop the script, which from the terminal is achieved with ctrl+c.

I started with @adamdrake’s twitterstreamtemplate on Github, which makes use of queuing in Python (still a concept I’m just getting familiar with, but this article has a nice basic overview of it). The Twython 3.4.0 documentation was also very helpful.

The Twitter API requires authentication, for which you’ll need to register an application with Twitter. You can do this here: https://dev.twitter.com/apps. Don’t worry too much about its name or description if you’re just tinkering, like I am. Once you’ve registered, you can obtain your new credentials (consumer_key, consumer_secret, token, and token_secret) from the Keys and Access Tokens tab.

Input your credentials in the appropriate places:

# Input your credentials below  
consumer_key = 'xxx'  
consumer_secret = 'xxx'  
token = 'xxx'  
token_secret = 'xxx'  

You can set parameters to find specific tweets, or just show a sample stream. The relevant lines are:

# stream.statuses.filter(track='twitter', language='en')  
stream.statuses.sample(language='en')  

As written, it’ll show you the sample stream. Switch the commented lines if you want to set your own search parameters. (More on that below.)

2. Hiccups: Windows 10, Python, and Unicode

My first pass at this program consisted of two goals: 1) successfully pull up a stream of tweets based on my search parameters, and 2) print them. The relevant line for achieving the second goal is:

print(tweet)  

I opened up the terminal and typed python tweet_stream.py to run the program. After a brief pause it returned this error:

return codecs.charmap_encode(input,self.errors,encoding_map)[0]  
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1270-1271: character maps to <undefined>  

The short explanation is that Windows 10 doesn’t play well with Python’s default output encoding, UTF-8. (I’m making my way through Python’s documentation on Unicode, which is rather long but truly fascinating to me.)

I explored installing a Linux shell with updated Python so I could use Bash. While that’s a good eventual idea for programming in general, it’s definitely a big diversion from the topic at hand. Instead, I implemented a program-specific fix using a much faster and less complicated method: convert the tweets to ASCII before they display in the Windows 10 terminal.

This is achieved by changing the encoding of the tweet text:

print(tweet['text'].encode('ascii','ignore'))

The above displays the text of the tweets using ASCII encoding, and ignores any characters it can’t encode. Instead of ignore, you could also use other handlers to indicate errors. Python documentation gives an example of what these different handlers do here.

>>> u.encode('ascii', 'ignore')  
b'abcd'  
>>> u.encode('ascii', 'replace')  
b'?abcd?'  
>>> u.encode('ascii', 'xmlcharrefreplace') # inserts XML character reference  
b'&#40960;abcd&#1972;'  
>>> u.encode('ascii', 'backslashreplace') # inserts a \uNNNN escape sequence  
b'\\ua000abcd\\u07b4'  
>>> u.encode('ascii', 'namereplace') # inserts a \N{...} escape sequence  
b'\\N{YI SYLLABLE IT}abcd\\u07b4'  

Now running the program with the Windows 10 terminal should give you… a lot of tweets. (It’ll keep going until you stop the program using ctrl+c. On a couple occassions when my program wasn’t finding many tweets, ctrl+c didn’t seem to work. I hit it a few more times in succession before it seemed to sync up with the program actually doing something, and stopped the operation.)

3. Refining the search function

To retrieve a stream of tweets filtered by your search parameters, you’ll want to alter this line of code:

stream.statuses.filter(track='twitter', language='en')

The track parameter is what lets you filter your tweets by keywords, hashtags, user mentions, and urls. In the above example, it will search for tweets containing the keyword “twitter”. You can find its accepted phrases here. Note that the phrases don’t work like search engine phrases - characters in the tweet must exactly match the search phrase, and punctuation within the quotes will be considered part of the phrase. For example, “cafĂ©” will not find “cafe” or vice versa, and “hello,” is a different phrase than “hello”.

The language parameter in the example above limits the search results to tweets written in English. There are further parameters you can use to filter your results, such as follow (which limits your stream to searching the timelines of the users you specify) and location (which can allow you to filter by geolocated tweets). Find the full list of parameters and their values here.

I intended to use the track parameter to find tweets containing any one of a list of hashtags. I started out with the code below:

stream.statuses.filter(track='#coding,#programming,#travel', language='en')  

Note that the comma separators will cause the program to find tweets that contain any of those search terms: #coding OR #programming, etc. If you want to have your search find tweets containing all your search terms, use spaces only.

The track function is pretty literal, and thus not very intuitive. For example, my “#programming” phrase called up tweets that were related to computer programming as well as television programming. While you could set other parameters that omit certain words, it’s not long before such an approach turns into a very lengthy guessing game. A cleaner (and faster) approach is to choose more specific search phrases, for example:

stream.statuses.filter(track='#coding,#python,#digitalnomad,#laptoplifestyle',  language='en')  

Your keywords will vary depending on what you want your stream to display. The Twitter Advanced Search page is a good resource for trying out different combinations before running your script.

I hope you find this overview of my first Twitter streaming script useful! Questions, or something I missed? Hit me up on Twitter @hivickylai.

Thanks for reading!

Like this post?