[img_assist|nid=367|title=|desc=|link=none|align=right|width=150|height=146]
When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.
vancouver-parking-tickets project at GitHub
MySQL SQL dump (42 MB)
Here's some data:
Number of tickets by day of the week:
Day of week | Count |
---|---|
Monday | 238482 |
Tuesday | 277797 |
Wednesday | 294478 |
Thursday | 274529 |
Friday | 243830 |
Saturday | 181235 |
Sunday | 121036 |
Number of tickets at each address:
Address | Count |
---|---|
1050 Robson St. | 17899 |
1150 Robson St. | 15674 |
850 Hornby St. | 10501 |
650 Broadway St. W. | 10426 |
1050 Alberni St. | 9729 |
1750 Broadway St. W. | 9121 |
1050 Hornby St. | 9100 |
650 Hornby St. | 8863 |
1050 Mainland St | 8328 |
1050 Homer St. | 8269 |
1150 Hamilton St. | 8094 |
850 Howe St. | 7941 |
850 Broadway St. W. | 7848 |
2250 4th Ave W. | 7634 |
950 Hornby St. | 7462 |
1950 4th Ave W. | 7403 |
2650 Granville St. | 7354 |
1650 Broadway St. W. | 7062 |
1350 Broadway St. W. | 6961 |
1150 Broadway St. W. | 6903 |
1050 Hamilton St. | 6889 |
950 Broadway St. W. | 6821 |
150 Davie St. | 6676 |
1150 Homer St. | 6658 |
1150 Mainland St | 6562 |
750 Broadway St. W. | 6553 |
950 Homer St. | 6504 |
1150 Davie St. | 6461 |
1150 Alberni St. | 6386 |
1050 Davie St. | 6202 |
Comments
Harro van der Klauw (not verified)
Wed, 2009-11-11 05:47
Permalink
Well.... do they work
Well.... do they work weekends?
David Grant
Thu, 2009-11-12 00:32
Permalink
Going to get the data and put it here
I'm going to put the data up on here. I need to gather it up again. I just did some queries in Django to the console so didn't save it or anything. I had some trouble querying for the day of the week, I had to add a patch to Django to make it work, not sure if I still have that patch or not. Ended up using plain SQL, but I didn't save the queries anwhere.
oggy (not verified)
Tue, 2010-01-05 17:48
Permalink
This is a brilliant idea...
Seriously: scraping millions of parking tickets and cross referencing dates,hours, locations etc. is brilliant. Imagine an iPhone app for that in each major city: "what are the chances I'll get a ticket here?"!!!
Add new comment