Today I've been playing around (or rather copying from a book that's playing around) with the Twitter API. Below is a program that creates a SQL database for a given Twitter user listing that user's friends, whether that friend has been searches, and the number of times that friend has appeared in our searches. I can't say I 100% understand how the machine works yet, but I have a running commentary that sheds a bit of light on something I certainly did not understand the first couple of times I looked at it. What the app is supposed to do is crawl through the friends of an account we select, then the friends of those friends, and so on. The end result would be a Twitter network that tells us the number of people from the network who have "friended" each of its members. It would also probably take years to create, since it can only be run 14 times a day...
#Import all the necessary libraries to open internet connections (urlllib),
#access the Twitter API (twurl), use JSON encoding, write a SQL database,
#and ignore certificate errors.from urllib.request import urlopen
import urllib.error
import twurl
import json
import sqlite3
import ssl
#the URL for the Twitter API we want to useTWITTER_URL = "https://api.twitter.com/1.1/friends/list.json"
#create a SQL database and a cursor to modify it with
conn = sqlite3.connect('spider.sqlite')
cur = conn.cursor()
#Create a new table called "Twitter" in our databse with three columns
#for the account friend, the number of times the friend was found,
#and the number of friends s/he has.cur.execute('''
CREATE TABLE IF NOT EXISTS Twitter
(name TEXT, retrieved INTEGER, friends INTEGER)''')
#Ignore certificate errors.ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
#App main dragwhile True:
#ask the user for an account to load into the database, with the option of
#quitting if the user is tired of loading users acct = input('Enter a Twitter account, or quit: ')
if (acct == 'quit'): break
#if the user enters nothing, select exactly one account from the database that
#has not yet been checked and prepare to check that. If no such accounts exist,
#user will be prompted for new input. if (len(acct) < 1):
cur.execute('SELECT name FROM Twitter WHERE retrieved = 0 LIMIT 1')
try:
acct = cur.fetchone()[0]
except:
print('No unretrieved Twitter accounts found')
continue
#encode the proper URL to communicate with API (adding keys), ignore SSL
#certificate errors while connecting, load the data in UTF format, and
#create a dictionary called "headers" so we can track the number of times we
#can access the Twitter API. url = twurl.augment(TWITTER_URL, {'screen_name': acct, 'count': '5'})
print('Retrieving', url)
connection = urlopen(url, context = ctx)
data = connection.read().decode()
headers = dict(connection.getheaders())
#Print how many times we can still access the API, then structure the data. print('Remaining', headers['x-rate-limit-remaining'])
js = json.loads(data) #Debug
#print json.dumps(js, indent=4)
#Indicate that we have now retrieved the account in our database. cur.execute('UPDATE Twitter SET retrieved=1 WHERE name = ?', (acct, ))
#Look through the friend list of the chosen Twitter account. For each friend
#in the list, print the friend's name, then look for that friend in the database,
#ignoring duplicate entries (?). If we find the friend in the database,
#we add one to their friend count and increase a tracker for the number
#of accounts we've revisited. If we don't find the friend in the database,
#we add the friend, indicating s/he has one friend (the user we are checking)
#and has never been retrieved. Then we increase a tracker for the number of friends
#we are adding for the first time. (Note: trackers are reset each time
#the app gets a new friend.) When we're done, we "save our data" and "quit". countnew = 0
countold = 0
for u in js['users']:
friend = u['screen_name']
print(friend)
cur.execute('SELECT friends FROM Twitter WHERE name = ? LIMIT 1', (friend,))
try:
count = cur.fetchone()[0]
cur.execute('UPDATE Twitter SET friends = ? WHERE name = ?', (count+1, friend))
countold = countold + 1
except:
cur.execute('''INSERT INTO Twitter (name, retrieved, friends) VALUES (?,0,1)''', (friend,))
countnew = countnew + 1
print('New accounts = ', countnew, 'revisted = ', countold)
conn.commit()
cur.close()
Comments
Post a Comment