This isn't a map, or a WNY-focused thing, but Ryan said I should put it up for people to look at so that's what I'm gonna do.
The following a a script I wrote one day during Dr. Bittner's class originally concieved to figure out what cities I could afford to live in. It is a script in Python which defines a function into which you can enter a city and it will look at Craigslist apartment postings for that city and calculate a per-bedroom average cost of rent based on those postings. The function is called averageRent (clever) and has 2 parameters: the URL of the cities' Craigslist apartment listing page and the number of pages of listings to count. For example, to get an average based on the first 5 pages of listings in Chicago, run the module and then enter into your Python shell:
averageRent("http://chicago.craigslist.org/apa/",5)
Note that the URL should be in quotes, as a string. The higher the number of pages you use the more samples (about 100 per page) you get and theoretically a more accurate estimate, just be careful not to put in a number which exceeds the actual number of pages of listings (a problem you might get in a small town). To get the URL you'll have to actually go to the cities' Craigslist page and click "apts/housing" under Housing at the top of the second column, and copy it from your browser. This is an interesting method since it is dynamic and uses the most up-to-date rent values, but also is problematic since it only includes apartments listed on the site, so user beware.
A little bit about the script. It uses the fact that the HTML source from one of these pages of listings has the entire text used in the link to the posting, for example, here is one line from Portland's listing:
<p><a href="http://portland.craigslist.org/mlt/apa/2564568555.html">$1395 / 3br - 1360ft² - Newer Construction w/Hdwds, Wash/Dry, Walk-In, Stainless, & Quiet</a> - <font size="-1"> (North Portland-Near Univ. of Portland)</font> <span class="p"> pic img</span></p>
The script simply reads the entire source code, looks for these lines, and takes the number directly after the dollar sign and the number directly before the "br" (for this listing 1395 and 3, respectivly), sums all the rent numbers, and divides by the sum of the bedrooms to get a per-room average. Since about 95% of the listings use this same format, a few listings are dropped, but if you use a good number of pages you can afford to lose a few and still get a high number of samples.
Without further ado, here is the script. Careful not to lose the indentations when you copy and paste, since this is Python. Feel free to leave any questions about what I did, or improvements.
-----------------------------------------------------
import urllib
def averageRent(page,n=0):
url = urllib.urlopen(page)
lines = url.readlines()
if n > 1:
for i in range(1,n+1):
url2 = urllib.urlopen(page + "index" + str(i) + "00.html")
lines2 = url2.readlines()
lines = lines + lines2
rooms = []
rent = []
brs = []
for i in range(0,len(lines)):
if ">$" in lines[i]:
rooms.append(lines[i])
for i in range(0,len(rooms)):
for j in range(0,len(rooms[i])):
if "br -" in rooms[i]:
if rooms[i][j:j+2] == ">$":
rent.append(rooms[i][j+2:j+6])
if rooms[i][j:j+4] == "br -":
brs.append(rooms[i][j-1])
sumrent = 0
sumrooms = 0
for k in range(0,len(rent)):
sumrent = sumrent + int(rent[k])
sumrooms = sumrooms + int(brs[k])
avg = float(sumrent/sumrooms)
print "Average rent is $", avg, " per room"
print str(len(rent)) + " rooms found"
(changed source code font style to 'Computer Code' and hoping it helps copy-paste)
Thanks for the code and detailed write-up. What version of python did you write this in? I know that some things have changed in 3.x and some of the code does not translate (possibly, I haven't tried 3.x python because 2.x is still widely used and supported).
Just ran on python 2.6.6 and found out that rent is EXPENSIVE!
Chicago: $1175/room
Buffalo: $393/room
I wonder if wages translate as well?
If anyone has any questions about how to load this code, or would like to add / modify /improve it for themselves, please let John know!
John,
I would add this to the top of the code:
user_city = raw_input("Please enter a valid craigslist city: ")user_pages = raw_input("Please enter the number of pages you would like to scan: ")Then you could change the hard coded bits to:
page="http://"+str(user_city)+".craigslist.org/apa/"def averageRent(page,n=int(user_pages)):btw: I think I need to install a new wysiwyg editor, this one is not good for entering code.
-Ryan
I'm not sure what version I wrote it in, it was whatever version was running on the computers in the GIAL that semester. I want to say 2.7?
And yes, there are a few things that could be done to make it more user friendly such as the input lines you added. I was thinking something like a look-up table or dictionary, which would take a while to type out but could eliminate any user error. The method you suggested could work for most cities, but would have an error for a few cities, especially ones with two-word names (for example the San Francisco page has a URL sfbay.craigslist.org/apa). Maybe something like
citylist = {1:"http://buffalo.craigslist.org/apa/" , 2:"http://rochester.craigslist.org/apa/", ........ }
citycodes = ["1 : Buffalo", "2 : Rochester" ........ ]
and then when the function is called, as the user to pick a city from the list, print citycodes, and use the number as a raw input. Then you could replace the line
url = urllib.urlopen(page)
with
url = urllib.urlopen(citylist[x])
where x is the user input. Like I said it would take a while to prepare but solve the problem I metioned earlier.
You could use your script to scrape http://www.craigslist.org/about/sites to create that dictionary, then use the city name as the key to the link. I don't know of the right library, but if someone enters a city that doesn't exist, perhaps you could find the closest match in the dictionary and offer a suggestion.