Skip to main content

Damn You, API Limits!

Today I've been playing around (or rather copying from a book that's playing around) with the Twitter API.  Below is a program that creates a SQL database for a given Twitter user listing that user's friends, whether that friend has been searches, and the number of times that friend has appeared in our searches.  I can't say I 100% understand how the machine works yet, but I  have a running commentary that sheds a bit of light on something I certainly did not understand the first couple of times I looked at it.  What the app is supposed to do is crawl through the friends of an account we select, then the friends of those friends, and so on.  The end result would be a Twitter network that tells us the number of people from the network who have "friended" each of its members.  It would also probably take years to create, since it can only be run 14 times a day...
#Import all the necessary libraries to open internet connections (urlllib),
#access the Twitter API (twurl), use JSON encoding, write a SQL database,
#and ignore certificate errors.
from urllib.request import urlopen
import urllib.error
import twurl
import json
import sqlite3
import ssl 
#the URL for the Twitter API we want to useTWITTER_URL = "https://api.twitter.com/1.1/friends/list.json" 
#create a SQL database and a cursor to modify it with
conn = sqlite3.connect('spider.sqlite')
cur = conn.cursor() 
#Create a new table called "Twitter" in our databse with three columns
#for the account friend, the number of times the friend was found,
#and the number of friends s/he has.
cur.execute('''
    CREATE TABLE IF NOT EXISTS Twitter
    (name TEXT, retrieved INTEGER, friends INTEGER)''') 
#Ignore certificate errors.ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE 
#App main dragwhile True: 
#ask the user for an account to load into the database, with the option of
#quitting if the user is tired of loading users
    acct = input('Enter a Twitter account, or quit: ')
    if (acct == 'quit'): break 
#if the user enters nothing, select exactly one account from the database that
#has not yet been checked and prepare to check that. If no such accounts exist,
#user will be prompted for new input.
    if (len(acct) < 1):
        cur.execute('SELECT name FROM Twitter WHERE retrieved = 0 LIMIT 1')
        try:
            acct = cur.fetchone()[0]
        except:
            print('No unretrieved Twitter accounts found')
            continue 
#encode the proper URL to communicate with API (adding keys), ignore SSL
#certificate errors while connecting, load the data in UTF format, and
#create a dictionary called "headers" so we can track the number of times we
#can access the Twitter API.
    url = twurl.augment(TWITTER_URL, {'screen_name': acct, 'count': '5'})
    print('Retrieving', url)
    connection = urlopen(url, context = ctx)
    data = connection.read().decode()
    headers = dict(connection.getheaders()) 
#Print how many times we can still access the API, then structure the data.    print('Remaining', headers['x-rate-limit-remaining'])
    js = json.loads(data)
    #Debug
    #print json.dumps(js, indent=4) 
#Indicate that we have now retrieved the account in our database.    cur.execute('UPDATE Twitter SET retrieved=1 WHERE name = ?', (acct, )) 
#Look through the friend list of the chosen Twitter account. For each friend
#in the list, print the friend's name, then look for that friend in the database,
#ignoring duplicate entries (?). If we find the friend in the database,
#we add one to their friend count and increase a tracker for the number
#of accounts we've revisited. If we don't find the friend in the database,
#we add the friend, indicating s/he has one friend (the user we are checking)
#and has never been retrieved. Then we increase a tracker for the number of friends
#we are adding for the first time. (Note: trackers are reset each time
#the app gets a new friend.) When we're done, we "save our data" and "quit".
    countnew = 0
    countold = 0
    for u in js['users']:
        friend = u['screen_name']
        print(friend)
        cur.execute('SELECT friends FROM Twitter WHERE name = ? LIMIT 1', (friend,))
        try:
            count = cur.fetchone()[0]
            cur.execute('UPDATE Twitter SET friends = ? WHERE name = ?', (count+1, friend))
            countold = countold + 1
        except:
            cur.execute('''INSERT INTO Twitter (name, retrieved, friends)  VALUES (?,0,1)''', (friend,))
            countnew = countnew + 1
        print('New accounts = ', countnew, 'revisted = ', countold)
        conn.commit()
    cur.close()

Comments

Popular posts from this blog

Getting Geodata From Google's API

The apps I'm going to be analyzing are part of Dr. Charles Severance's MOOC on Python and Databases and work together according to the following structure (which applies both in this specific case and more generally to any application that creates and interprets a database using online data). The data source, in this case, is Google's Google Maps Geocoding API.  The "package" has two components: geoload.py  and geodump.py .  geoload.py  reads a list of locations from a file -- addresses for which we would like geographical information -- requests information about them from Google, and stores the information on a database ( geodata.db ).  geodump.py  reads and parses data from the database in JSON, then loads that into a javascript file.  The javascript is then used to create a web page on which the data is visualized as a series of points on the world-map.  Dr. Severance's course focuses on Python, so I'm only going to work my way through ...

Compiling and Executing Java Files With -cp

I decided I was going to "man up" and figure out how to compile a java program with an external dependency from the command line instead of relying on an IDE-- the DOS command line, to be more specific. I ran into a few problems: 1.  The external dependency was given to me as a java file.  I experimented compiling it as a .jar, but I wasn't sure how to import a class from a .jar, so I ended up compiling it into a class. 2.  When I tried to run the file, I got an error saying that the class had been compiled with a different version of Java than my JRE.  The Internet told me to check my path variable for Java.  It sure looked like it was pointing to the latest JRE (and the same version of Java as my compiler).  I asked the Internet again and found the following command: for %I in (java.exe) do @echo %~$PATH:I I'm not exactly sure what the syntax of that magic command is (intuitively it's returning the path that executes when I run the "java" com...

Quick Find / Quick Union (Connected Nodes)

Setup This week I learned about the "Quick Find" or "Quick Union" algorithm. Imagine an NxN grid of nodes, some of which are connected by lines. A connection can be interpreted as accessibility: if two nodes are connected, you can get from one to the other. Every node is accessible to itself: to get where you already are, stay there. Also, If you can get from A to B, you can go back from B to A. And if you can get from A to B and from B to C, then you can get from A to C. As a consequence, the connection between nodes divides the grid into regions of mutually accessible nodes. You can travel from any node in a given region to any other node in that region -- but not to any nodes outside that region (exercise to reader -- proof by contradiction). The problem has two parts. First, find a way to represent this grid structure and the accessibility relation; second, use your schema to efficiently calculate whether two given nodes are accessible to each other. ...