[img_assist|nid=367|title=|desc=|link=none|align=right|width=150|height=146]
When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.
vancouver-parking-tickets project at GitHub
MySQL SQL dump (42 MB)
Here's some data:
Recent comments