Skip to main content

Damn You, API Limits!

Today I've been playing around (or rather copying from a book that's playing around) with the Twitter API.  Below is a program that creates a SQL database for a given Twitter user listing that user's friends, whether that friend has been searches, and the number of times that friend has appeared in our searches.  I can't say I 100% understand how the machine works yet, but I  have a running commentary that sheds a bit of light on something I certainly did not understand the first couple of times I looked at it.  What the app is supposed to do is crawl through the friends of an account we select, then the friends of those friends, and so on.  The end result would be a Twitter network that tells us the number of people from the network who have "friended" each of its members.  It would also probably take years to create, since it can only be run 14 times a day...
#Import all the necessary libraries to open internet connections (urlllib),
#access the Twitter API (twurl), use JSON encoding, write a SQL database,
#and ignore certificate errors.
from urllib.request import urlopen
import urllib.error
import twurl
import json
import sqlite3
import ssl 
#the URL for the Twitter API we want to useTWITTER_URL = "https://api.twitter.com/1.1/friends/list.json" 
#create a SQL database and a cursor to modify it with
conn = sqlite3.connect('spider.sqlite')
cur = conn.cursor() 
#Create a new table called "Twitter" in our databse with three columns
#for the account friend, the number of times the friend was found,
#and the number of friends s/he has.
cur.execute('''
    CREATE TABLE IF NOT EXISTS Twitter
    (name TEXT, retrieved INTEGER, friends INTEGER)''') 
#Ignore certificate errors.ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE 
#App main dragwhile True: 
#ask the user for an account to load into the database, with the option of
#quitting if the user is tired of loading users
    acct = input('Enter a Twitter account, or quit: ')
    if (acct == 'quit'): break 
#if the user enters nothing, select exactly one account from the database that
#has not yet been checked and prepare to check that. If no such accounts exist,
#user will be prompted for new input.
    if (len(acct) < 1):
        cur.execute('SELECT name FROM Twitter WHERE retrieved = 0 LIMIT 1')
        try:
            acct = cur.fetchone()[0]
        except:
            print('No unretrieved Twitter accounts found')
            continue 
#encode the proper URL to communicate with API (adding keys), ignore SSL
#certificate errors while connecting, load the data in UTF format, and
#create a dictionary called "headers" so we can track the number of times we
#can access the Twitter API.
    url = twurl.augment(TWITTER_URL, {'screen_name': acct, 'count': '5'})
    print('Retrieving', url)
    connection = urlopen(url, context = ctx)
    data = connection.read().decode()
    headers = dict(connection.getheaders()) 
#Print how many times we can still access the API, then structure the data.    print('Remaining', headers['x-rate-limit-remaining'])
    js = json.loads(data)
    #Debug
    #print json.dumps(js, indent=4) 
#Indicate that we have now retrieved the account in our database.    cur.execute('UPDATE Twitter SET retrieved=1 WHERE name = ?', (acct, )) 
#Look through the friend list of the chosen Twitter account. For each friend
#in the list, print the friend's name, then look for that friend in the database,
#ignoring duplicate entries (?). If we find the friend in the database,
#we add one to their friend count and increase a tracker for the number
#of accounts we've revisited. If we don't find the friend in the database,
#we add the friend, indicating s/he has one friend (the user we are checking)
#and has never been retrieved. Then we increase a tracker for the number of friends
#we are adding for the first time. (Note: trackers are reset each time
#the app gets a new friend.) When we're done, we "save our data" and "quit".
    countnew = 0
    countold = 0
    for u in js['users']:
        friend = u['screen_name']
        print(friend)
        cur.execute('SELECT friends FROM Twitter WHERE name = ? LIMIT 1', (friend,))
        try:
            count = cur.fetchone()[0]
            cur.execute('UPDATE Twitter SET friends = ? WHERE name = ?', (count+1, friend))
            countold = countold + 1
        except:
            cur.execute('''INSERT INTO Twitter (name, retrieved, friends)  VALUES (?,0,1)''', (friend,))
            countnew = countnew + 1
        print('New accounts = ', countnew, 'revisted = ', countold)
        conn.commit()
    cur.close()

Comments

Popular posts from this blog

Getting Geodata From Google's API

The apps I'm going to be analyzing are part of Dr. Charles Severance's MOOC on Python and Databases and work together according to the following structure (which applies both in this specific case and more generally to any application that creates and interprets a database using online data). The data source, in this case, is Google's Google Maps Geocoding API.  The "package" has two components: geoload.py  and geodump.py .  geoload.py  reads a list of locations from a file -- addresses for which we would like geographical information -- requests information about them from Google, and stores the information on a database ( geodata.db ).  geodump.py  reads and parses data from the database in JSON, then loads that into a javascript file.  The javascript is then used to create a web page on which the data is visualized as a series of points on the world-map.  Dr. Severance's course focuses on Python, so I'm only going to work my way through ...

Shell Sort

Today I spent a little bit of time researching the "Shell" sort.  I wanted to post a few notes about the Princeton Algorithms Course's implementation to help me solidify my understanding. First, a little tidbit.  When I first heard about this algorithm, I thought it had something to do with shell games.  Turns out a man named Donald Shell discovered this method of sorting, whence the name. The Algorithms  book gives the following explanation (Sedgewick and Wayne,  Algorithms, 4th ed., p. 258): The idea is to rearrange the array to give it the property that taking every hth entry (starting anywhere) yields a sorted subsequence. Such an array is said to be h-sorted. Put another way, an h-sorted array is h independent sorted subsequences, interleaved together. By h-sorting for some large values of h, we can move items in the array long distances and thus make it easier to h-sort for smaller values of h. Using such a procedure for any sequence of values o...

It's a Date

I guess I should really be putting these things up in GitHub.  The way I see it, the coding journal is just a place to share the code I write or study along with any notes I have about it.  It's sort of a documentation LiveJournal, if you will. Anyway, this is a "study" for my project idea: create an app that will prompt the user for two dates, then calculate the difference between them. The burden of this study is twofold: (1) convert dates in standard American form (e.g. December 15, 1993) into dates in standard American numeric form (e.g. 12/15/1993); (2) create a numerical representation of the date. To process the date, I started with a list of the months.  I then used a loop to create a dictionary that would attach a value to each month. Next I had to parse the user entry (I haven't added any debugging for incorrect entries yet). I did so by splitting the entry into "raw" data.  I used my dictionary to process the month name into a number, str...