Vancouver Sun Parking Tickets Website Screen Scraper

When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.

vancouver-parking-tickets project at GitHub

MySQL SQL dump (42 MB)

Here's some data:

Number of tickets by day of the week:

Day of week Count
Monday 238482
Tuesday 277797
Wednesday 294478
Thursday 274529
Friday 243830
Saturday 181235
Sunday 121036

Number of tickets at each address:

Address Count
1050 Robson St. 17899
1150 Robson St. 15674
850 Hornby St. 10501
650 Broadway St. W. 10426
1050 Alberni St. 9729
1750 Broadway St. W. 9121
1050 Hornby St. 9100
650 Hornby St. 8863
1050 Mainland St 8328
1050 Homer St. 8269
1150 Hamilton St. 8094
850 Howe St. 7941
850 Broadway St. W. 7848
2250 4th Ave W. 7634
950 Hornby St. 7462
1950 4th Ave W. 7403
2650 Granville St. 7354
1650 Broadway St. W. 7062
1350 Broadway St. W. 6961
1150 Broadway St. W. 6903
1050 Hamilton St. 6889
950 Broadway St. W. 6821
150 Davie St. 6676
1150 Homer St. 6658
1150 Mainland St 6562
750 Broadway St. W. 6553
950 Homer St. 6504
1150 Davie St. 6461
1150 Alberni St. 6386
1050 Davie St. 6202


I'm going to put the data up on here. I need to gather it up again. I just did some queries in Django to the console so didn't save it or anything. I had some trouble querying for the day of the week, I had to add a patch to Django to make it work, not sure if I still have that patch or not. Ended up using plain SQL, but I didn't save the queries anwhere.

Seriously: scraping millions of parking tickets and cross referencing dates,hours, locations etc. is brilliant. Imagine an iPhone app for that in each major city: "what are the chances I'll get a ticket here?"!!!

Most of the time I don’t make comments on websites. but I'd like to say that this article really forced me to do so. Really nice post!

This is just the information I am finding everywhere. Thanks for your blog. I just subscribe your blog. This is a nice blog..
For More Information

I admire what you have done here. I like the part where you say you are doing this to give back but I would assume by all the comments that this is working for you as well.
moped insurance

I am unable to get many games on xbox so i am wondering for the xbox live codes so that it could help in buying these paid games.

I am happy to find this post very useful for me. as it contains lot of information. I always prefer to read the quality content and this thing I found in you post. Thanks for sharing.
scooter insurance

This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting this again.

I can’t imagine focusing long enough to research; much less write this kind of article. You’ve outdone yourself with this material. This is great content.
How to survive in the woods

Super-Duper site! I am Loving it!! Will come back again. Im taking your feed also. Thanks.
More Info

i am always looking for some free stuffs over the internet. there are also some companies which gives free samples.

Add new comment