python

Logging in to a Django site with a magic token

I have a simple video website for my kids and each kid has a separate login. This is so they can each have their own videos, but also so that some videos can be private (ie. hidden from the outside world, or other logged in users). Typing in a username and password is impossible for my kids to do, as they are almost 5 and 2 years old, and they use this website on Google TV. So, with a magic token-style login, all they need to do is navigate to their bookmark on the Google TV homepage and press OK on the remote control.

(I don't need crazy security--it wouldn't be the end of the world if somehow someone guessed the magic token and saw some private videos, which are basically just home videos uploaded to Youtube. Videos that I really wouldn't want the public to see don't get uploaded to Youtube in the first place.)

I couldn't find how to do this easily, although one person on stackoverflow suggested "logging in the user in the view by calling 'login'". The tricky part was figuring out that I had to set the User object's backend to 'django.contrib.auth.backends.ModelBackend'. It's a bit of a hack, but it works, and it's simple.

models.py:

class MagicToken(models.Model):
    user = models.OneToOneField(User)
    magictoken = models.CharField(max_length=128, unique=True)
 
    def __unicode__(self):
        return unicode(self.user)

views.py:

from django.http import HttpResponse, HttpResponseRedirect, Http404
import django.contrib.auth.login
 
class MagicTokenLogin(View):
    def get(self, request, token):
        try:
            magic_token_obj = MagicToken.objects.get(magictoken=token)
        except MagicToken.DoesNotExist:
            raise Http404
 
        user = magic_token_obj.user
        user.backend = 'django.contrib.auth.backends.ModelBackend'
        django.contrib.auth.login(request, user)
        if request.user.is_authenticated():
            # login successful
            return HttpResponseRedirect(reverse('some-view-for-logged-in-users'))
        else:
            # login failed
            return HttpResponseRedirect(reverse('some-view'))

eyefiserver2 - A standalone Eye-Fi Server in python, for linux

I recently forked the defunct eyefiserver project. The new project is called eyefiserver2. It is a server for eye-fi cards that runs on Linux using Python, however, it should be possible to run it on any OS, I just haven't tested it on anything other than Linux.

Topic: 

Moving Data from One Database to Another with Django

So I have a django database I'm working on and I decided I wanted to do the development in sqlite3 instead of mysql. I decided to do this because it makes it easier, for example, to have someone else work on HTML/CSS if I can just give them a directory, tell them to run a bash script and go to http://localhost:8000, rather than them having to do all that AND setup a mysql server. Sure, that can also be done with a script, but with sqlite things are just a hell of a lot easier in some ways.

I decided I wanted to take the data out of my mysql database and load it into a sqlite db. Not so simple, you can't just take an SQL dump from mysql and import it into sqlite3. I started Googling around for "mysql to sqlite" and didn't really get anywhere. I then realized that Django can already talk to sqlite and mysql transparently without me having to know any SQL. So I thought about writing my own python modules to do this, but it turns out someone has already done it.

The django-extensions project has a manage.py command called "dumpscript" that "Generates a Python script that will repopulate the database using objects. The advantage of this approach is that it is easy to understand, and more flexible than directly populating the database, or using XML." This was exactly what I wanted.

Tags: 

Concatenate/Combine PDF Files in Linux

I know there are lots of ways to do this. This is not a HOW-TO, but just sharing a script I made for doing this. It's a decent example of how to write a command-line utility in Python.

Since I write my cover letters and resumes in LaTeX I always need to concatenate the two together before sending it to an employer. This is very important so that the person on the other end prints out both and it makes their life easier by not having to worry about two files. For the longest time I just had a simple script that looked like this:

#!/bin/sh
Tags: 
Topic: 

Vancouver Sun Parking Tickets Website Screen Scraper

[img_assist|nid=367|title=|desc=|link=none|align=right|width=150|height=146]
When the Vancouver Sun came out with their Vancouver parking tickets database I immediately had some burning questions, like, did the meter maids work on holidays? Do the work less in the evening than during the day? I found it difficult to answer these questions using their interface, so I decided to screen scrape all 1.6 million parking tickets in to my own MySQL database. This was a bit challenging as they made it difficult to screen scrape the data but eventually it could be done simply by first getting an AppKey, a hidden value inside the HTML source and then doing queries using that AppKey as a parameter. It took about a week to get all 1.6 million tickets downloaded. By using Django, it was easy to get them in to a database and view the results. Initially I just put all the data in to one table, then later I decided to normalize the data a bit which was interesting as I decided to do that in pure SQL which I hadn't done before. I did the scraping itself using a combination of BeautifulSoup, lxml, and mechanize.

vancouver-parking-tickets project at GitHub

MySQL SQL dump (42 MB)

Here's some data:

Topic: 

Vancouver Park Board Swimming Lessons Screen Scraper

I was waiting for the January 2010 swimming times to appear on the Vancouver Park Board website but I got tired of all the clicking and scrolling required to see when the lessons were available. The other problem was that once the lessons appeared for one pool, some of the other pools still hadn't posted their lesson schedule for January 2010. The times that came out for the first pool were not ideal so I wanted to wait and see what came out for the other pool, while making sure that the first pool didn't book up.

Topic: 

Django Recipes Application

[img_assist|nid=362|title=Django Recipes Screenshot|desc=|link=popup|align=right|width=98|height=100]

My mom was writing a family cookbook using Microsoft Word and I thought this was a bad idea for several reasons. At first I thought about using LaTeX to separate the style from the content a bit, then I thought about using XML, then I settled on a database as being the most generic to store recipe data. I quickly decided on using Django to create this cookbook framework because Python is probably my strongest language and it makes creating custom websites really easy.

Topic: 

Django Applications at Work

Today someone at my work just released a simple Django app, the second one now running at my company. Here are the two Django apps running so far:

Server scanner

This is the first Django application that we use. It started out as a script that would scan all the computers on our network and determine what version of our software they were running by inspecting a special file in one of the publicly available SMB shares. The script took about half an hour to run and generated static html that was then served up by Apache. This became cumbersome for a number of reasons. For one, it took about an hour to scan the entire network. There was no persistence whatsoever so my script had no memory of which servers had our software installed on it. These servers are quick to check. The ones that are slow to check are the ones that are actually no longer on the network, or do not have the publicly available share that we are expecting. When the project manager emailed me one day and said "the servers on this page are often out of date" I sprung in to action and converted it to a Django application. There is a script that scans the network and saves Server instances to an sqlite db. I also used a couple generic views to display a list of servers and to edit a server (to provide a description). Converting to Django from my static html generator gave me a few new features right away: 1) full scans were much quicker because I could scan the servers that actually have the software installed with a higher priority. In another thread, I can look for new servers on the network with a lower priority. There's also the "timesince" filter which is awesome, and the automatically filled DateTime fields for "created" and "updated" when I update a model instance. Not to mention ModelForms and generic views which allowed me to get this up and running quickly. The best part is that each server entry has a description entry, where people can put a note like "DO NOT TOUCH THIS SERVER" or "Dave's server, please feel free to play around on it".

Task tracker

This web app is for the software developers to keep track and inform other of what they are working on. Up until we haven't really had anything of the kind, save for e-mail and a bug tracker. Typically software developers send weekly reports to team members. This involves typing one up or constantly opening up an email draft. Either way, other team members can't see what the person is working on in real-time. So one of the members of our Python Club got interested in Django and wrote this up very quickly. He used YUI which is awesome and he just released it to a wider audience today.

Build website - in progress

This is a new one that is in progress. Currently, each build generates a folder structure containing the result of the build along with a bunch of log files and a bunch of other junk. A perl script scans these folders periodically and generates static HTML (at least I think that's how it works, I don't work on it so I don't know the details). My boss is in the process of creating some Django models. The main table is a table called Build. Then his HTML will be generated from Django templates. That is the biggest advantage of moving to Django here, in my opinion, is that he can use templates rather than HTML generation by a perl script. Having the information in a database is nice but not required. He will get some advantages from having the database though. He'll have some other tables, such as BuildServer, Release, and Product that the Build server will contain ForeignKeys to to allow filtering the builds based on these ForeignKeys. Also the build manager (someone other than my boss who is designing the Django part) will be able to use the admin interface to add new Releases, BuildServers, etc...

Conclusion

This apps were implemented extremely quickly and are far better than all the crappy SAP and SharePoint-based web applications used at work and the other simpler web pages around that I think are mostly static HTML or ASP. It will be interesting to see if it catches on. I had this master to plan to re-implement our entire employee performance review site (which is a particularly crappy SAP app) in Django but there are too many hurdles. We'd need to hook in to the existing SAP user tables somehow and also use the existing authentication mechanisms. But a demonstration app using the basic functionality wouldn't be that hard, and then we could worry about the hard bits later after it got approval. The existing employee performance review site is REALLY bad.

Anyways, I just thought it was cool how quickly Django has been adopted for a few simple projects at a company that is for the most part a Microsoft shop. People do use OSS here but there is a bit of a tendency to use Microsoft solutions and a bit of Not-Invented-Here syndrome as well.

Don't Rewrite Working Black-box Code

At work I am modifying an existing tool to work from the command line instead of a GUI. Currently everything is a bit coupled to the GUI. On Friday, the next problem I encounted was a global variable in Common.py that was not initialized.

""" Common.py """
def initHSCM():
    global hSCManager
    ...
    hSCManager = win32serviceOpenSCManager(None, None, win32con.SERVICE_ALL_ACCESS)
    ...
 
...
def startService(service):
    """ function that uses hSCManager """
    # These functions don't work when hSCManager is set to None
 
def stopService(service):
    """ function that uses hSCManager """
    # These functions don't work when hSCManager is set to None
...

The only place it is getting initialized is when iniHSCM() is called from GUI.py.

""" GUI.py """
Common.initHSCM()
...
startService("blah")
...
stopService("blah")

My new CLI.py does not uses Common.py as well as the other underlying libraries. Instead of just calling initHSCM() inside CLI.py so that calls to Common.py, I decided to rewrite all the functions that use the hSCManager global variable (ie. startService, stopService, and many others). My plan was to rewrite them to not depend on this global variable, and make them a bit cleaner. Instead they would get an hSCManager handle and close it at the end. So service manager objects would be shorter-lived things. Like most of the code in this project, it's a rat's nest and was written by people who knew C and Java better than they knew python. So part of my motivation for re-writing was to clean things up a bit. After I had re-writen a few of the functions, I realized that I had unknowingly creating new bugs (in the functions I was writing). Even though I was writing some unit tests (which was making the development process take longer on the whole) I knew that the code I was writing was going to have more bugs in it than the code that had essentially been working and stable for years (albeit ugly). The last thing I need is bugs in this code. When bugs happen I want to know that they are most likely in my new code.

In the end, I gave up and this is what I did:

""" CLI.py """
Common.initHSCM() # easy hack
...
startService("blah")
...
stopService("blah")

An even easier way is to move responsibility for initializing hSCManager into Common.py. Something like this?

""" Common.py """
hSCManager = win32serviceOpenSCManager(None, None, win32con.SERVICE_ALL_ACCESS)
 
...
def startService(service):
    """ function that uses hSCManager """
    # These functions don't work when hSCManager is set to None
 
def stopService(service):
    """ function that uses hSCManager """
    # These functions don't work when hSCManager is set to None
...

Hmm, that was easy. Moral of the story: don't re-write code that is several years old and works unless you really, really have to. Who cares if it's ugly and a hack. You didn't write it.

Tags: 

Using XML for Code Documentation is Just Plain Wrong

I was just looking at some C# code at work today and it had XML Documentation (like javadoc or python docstrings, only with XML). Who was the idiot that came up with that idea? It's the most insane thing I've ever seen. Let's look at the predecessors to C#'s XML documentation:

Javadoc:

/**
 * Returns an Image object that can then be painted on the screen. 
 * The url argument must specify an absolute {@link URL}. The name
 * argument is a specifier that is relative to the url argument. 
 * <p>
 * This method always returns immediately, whether or not the 
 * image exists. When this applet attempts to draw the image on
 * the screen, the data will be loaded. The graphics primitives 
 * that draw the image will incrementally paint on the screen. 
 *
 * @param  url  an absolute URL giving the base location of the image
 * @param  name the location of the image, relative to the url argument
 * @return      the image at the specified URL
 * @see         Image
 */
 public Image getImage(URL url, String name) {
	try {
	    return getImage(new URL(url, name));
	} catch (MalformedURLException e) {
	    return null;
	}
 }

Then, doxygen, which looks a lot like javadoc:

      /**
       * a normal member taking two arguments and returning an integer value.
       * @param a an integer argument.
       * @param s a constant character pointer.
       * @see Test()
       * @see ~Test()
       * @see testMeToo()
       * @see publicVar()
       * @return The test results
       */
       int testMe(int a,const char *s);

Unfortunately Genshi doesn't syntax highlight the javadoc comments. But it looks fairly readable. Let's try a python docstring example. There is no one standard. One of the documentation generators for Python, Epydoc understands plaintext, javadoc, epydoc, and reStructuredText.

Python code with epydoc style docstrings:

def x_intercept(m, b):
    """
    Return the x intercept of the line M{y=m*x+b}.  The X{x intercept}
    of a line is the point at which it crosses the x axis (M{y=0}).
 
    This function can be used in conjuction with L{z_transform} to
    find an arbitrary function's zeros.
 
    @type  m: number
    @param m: The slope of the line.
    @type  b: number
    @param b: The y intercept of the line.  The X{y intercept} of a
              line is the point at which it crosses the y axis (M{x=0}).
    @rtype:   number
    @return:  the x intercept of the line M{y=m*x+b}.
    """
    return -b/m

Python code with one example of reStructuredText docstrings (this one includes the types of the parameters but they aren't necessary):

def fox_speed(size, weight, age):
    """
    Return the maximum speed for a fox.
 
    :Parameters:
      size
        The size of the fox (in meters)
      weight : float
        The weight of the fox (in stones)
      age : int
        The age of the fox (in years)
    """
    #[...]

I couldn't find any nice examples for C# XML Documentation. The C# XML Documentation Tutorial has some examples, but conveniently, none that include all the tags that I would need to replicate the javadoc example I showed above. So I'll convert the Java example to C#:

   /// <summary>
   /// Returns an Image object that can then be painted on the screen. 
   /// The url argument must specify an absolute {@link URL}. The name
   /// argument is a specifier that is relative to the url argument. 
   /// 
   /// This method always returns immediately, whether or not the 
   /// image exists. When this applet attempts to draw the image on
   /// the screen, the data will be loaded. The graphics primitives 
   /// that draw the image will incrementally paint on the screen.</summary>
   /// 
   /// <param name="url">an absolute URL giving the base location of the image</param>
   /// <param name="name">the location of the image, relative to the url argument</param>
   /// <returns>
   /// the image at the specified URL</returns>
   /// <seealso cref="Image">
   /// Read more about the Image class</seealso>
 */
 public Image getImage(URL url, String name) {
	try {
	    return getImage(new URL(url, name));
	} catch (MalformedURLException e) {
	    return null;
	}
 }

I followed Microsoft's convention (because they know best) of putting the opening tags on a line on their own.

The javadoc sucks because you have to put a <p> (or <br />?) to make a new line which is stupid. Otherwise it's pretty readable, and same goes for doxygen. Especially the @param and @return tags. The Epydoc-style python docstrings suck. You have to specify the type using a @type tag and the return type using an @rtype tag. The reStructuredText example looks the best to me. No tags at all, except for the :Parameters: heading which should be there anyways. The C# comments are an eyesore. Even if Visual Studio had syntax highlighting for the comments it would suck. Did Microsoft look at the two major previous implementations (doxygen and javadoc) and decide that XML was a better way to document code?

I recently saw an interesting comment in scipy's source about one of scipy's guiding principles in designing the docstring standard for their codebase:

A guiding principle is that human readers of the text are given precedence over contorting docstrings so our tools produce nice output. Rather than sacrificing the readability of the docstrings, we have written pre-processors to assist tools like epydoc_ and sphinx_ in their task.

Microsoft clearly took the opposite route and decided to make code documentation readability by human readers a low priority.

Drupal Module Updater

This script will automatically update a drupal module if your drupal source code is stored in a Subversion respository. It first removes all files except for the .svn directories, then extracts the tarball for the new version of the module. Then it runs an svn status command to see which files are new, which files have been removed in the new version, and which files have changed.

Topic: 

Batch JPEG Photo Renamer

I used this script all the time before Picasa finally added this functionality. This script renames a whole bunch of photos, in a directory for example, appending numbers to the end of a base filename according to the EXIF dates stored inside the JPEGs. For example, a directory full of files that looks like this:

IMG_0123.jpg
IMG_0124.jpg
IMG_0127.jpg
IMG_0128.jpg
...
IMG_0248.jpg
IMG_0250.jpg

could be renamed to this:

Camping Photos_001.jpg
Camping Photos_002.jpg
Camping Photos_003.jpg
Camping Photos_004.jpg
...
Camping Photos_112.jpg
Topic: 

Low Disk Space Warning Script

Have you ever experienced a full disk on a server or a desktop? Not fun. This script would normally be run as a cron job and would notify you by email if any drive's free disk space has passed below a certain threshold. The code could be better; I wrote this one a long time ago when I was a bit of a n00b and I was in a rush as well. I might make take a look at it again and see if I can make some improvements.

Topic: 

Pages

Subscribe to RSS - python