Vancouver Sun Parking Tickets Website Screen Scraper

[img_assist|nid=367|title=|desc=|link=none|align=right|width=150|height=146]
When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.

vancouver-parking-tickets project at GitHub

MySQL SQL dump (42 MB)

Here's some data:

Number of tickets by day of the week:

Day of week Count
Monday 238482
Tuesday 277797
Wednesday 294478
Thursday 274529
Friday 243830
Saturday 181235
Sunday 121036

Number of tickets at each address:

Address Count
1050 Robson St. 17899
1150 Robson St. 15674
850 Hornby St. 10501
650 Broadway St. W. 10426
1050 Alberni St. 9729
1750 Broadway St. W. 9121
1050 Hornby St. 9100
650 Hornby St. 8863
1050 Mainland St 8328
1050 Homer St. 8269
1150 Hamilton St. 8094
850 Howe St. 7941
850 Broadway St. W. 7848
2250 4th Ave W. 7634
950 Hornby St. 7462
1950 4th Ave W. 7403
2650 Granville St. 7354
1650 Broadway St. W. 7062
1350 Broadway St. W. 6961
1150 Broadway St. W. 6903
1050 Hamilton St. 6889
950 Broadway St. W. 6821
150 Davie St. 6676
1150 Homer St. 6658
1150 Mainland St 6562
750 Broadway St. W. 6553
950 Homer St. 6504
1150 Davie St. 6461
1150 Alberni St. 6386
1050 Davie St. 6202
Topic: 

Comments

I'm going to put the data up on here. I need to gather it up again. I just did some queries in Django to the console so didn't save it or anything. I had some trouble querying for the day of the week, I had to add a patch to Django to make it work, not sure if I still have that patch or not. Ended up using plain SQL, but I didn't save the queries anwhere.

Seriously: scraping millions of parking tickets and cross referencing dates,hours, locations etc. is brilliant. Imagine an iPhone app for that in each major city: "what are the chances I'll get a ticket here?"!!!

Add new comment