scraping

Vancouver Sun Parking Tickets Website Screen Scraper

[img_assist|nid=367|title=|desc=|link=none|align=right|width=150|height=146]
When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.

vancouver-parking-tickets project at GitHub

MySQL SQL dump (42 MB)

Here's some data:

Topic: 

Vancouver Park Board Swimming Lessons Screen Scraper

I was waiting for the January 2010 swimming times to appear on the Vancouver Park Board website but I got tired of all the clicking and scrolling required to see when the lessons were available. The other problem was that once the lessons appeared for one pool, some of the other pools still hadn't posted their lesson schedule for January 2010. The times that came out for the first pool were not ideal so I wanted to wait and see what came out for the other pool, while making sure that the first pool didn't book up.

Topic: 
Subscribe to RSS - scraping