python

Why I Chose Python Over C#

I was recently tasked with writing a sipmle UI and the choice of what language/framework to use quickly boiled down to two choices: C# or Python. I eventually chose Python and here's why:

XML

When I started sketching out a design for this project in C#, I ended up envisioning lots of XML configuration files, in place of compiled-in, hard-coded configuration. In Python, however, XML is rarely required. Instead you can use python modules and data structure to specify your configuration.

You can notice this difference clearly if you look at C# and Java web frameworks and compare them to Python web frameworks. Django (a Python web framework) has a few configuration files. One is settings.py which is basically just a list of key value pairs. Some of the values, however, are Python tuples (immutable lists). It looks like a hybrid between a .ini file and an xml file. The beauty of this is that you can actually run a program like pychecker or pylint on your program and if you are trying to access a key in your settings file that doesn't exist, it will complain! Try doing that in a compiled language. So Django and other programs just take advantage of the fact that since code isn't compiled, you can put all your config in code, and you can easily tweak it later without needed to re-compile anything.

In web frameworks and other projects for C#/.NET and Java, XML is a huge part of configuration. Spring, Hibernate, Ant, log4j/log4net all rely on either XML configuration or annotations (which just couple settings to your code and bake it into the build). So if you write your own applications in Java/C# you will find yourself also using XML for configuration and writing code to parse and possibly write XML as well. The only time I ever enjoyed using XML was when I was using the ElementTree library for Python, which is now a standard library in Python.

Less code

Python code is generally a lot shorter. This means it's quicker to write, quicker to read, quicker to debug, quicker to modify later. No braces, no semi-colons, terse data structures, list comprehensions.

No compiling

Being able to modify code in the field is huge. Many times we've modified Python code in the field using a text editor after getting some obscure error and then re-ran the script. No re-compilation necessary.

Libraries

I tried porting a simple python script to C# just to see how easy/hard it would be. The first step was porting the command-line options parsing. I used GNU getopt style parsing, which is included in python's getopt library. No such thing is included in .NET. There is a third-party library, CSharpOptParse. Having to download this was a bit of a turn off. Then I looked for an example of CSharpOptParse usage and I found one. Ugh. The python getopt example is much nicer. If you don't like getopt there is also optparse (apparently, "optparse is a more convenient, flexible, and powerful library for parsing command-line options than getopt". I'll have to give it a try!). It looks even simpler than getopt!

The next thing I to find was to look at how to call an external executable in C#/.NET and capture stdout and/or stderr. Talk about annoying. Python's new-ish subprocess library is awesome.

Finally my Python script does some path splitting using os.path.split and os.path.splitext. I did find .NET's Path class to be pretty convenient, although no better than python's os.path.

Documentation

The Python documentation is far better than anything I have seen in the .NET/C# world. Maybe it's because smart people use Python.

Conclusion

I've been a long-time Python user but I do like languages like C# and Java as well, but when I put Python and C# side-by-side for this simple little project, nothing competes with it.

Tags:

Python Tips

This will be a dumping ground for Python tips that I would like to share with others.

  • Command-line parsing: Use the optparse library (pre-Python 2.7) or the argparse library (Pyton 2.7 and greater). I would stay away from getopt. It's too old and the code requird to use it is unreadable and requires more effort to use.
  • print statments: Don't use them. Use the logging module. Doesn't apply to Python 3.x where you can use the print function and easily replace it later with a call to a logging command. Even in Python 3.x though, I recommend not using print statements but setting up logging early on.
  • Always make your module importable: Enclose all the logic in functions and the main entry point inside a "def main():" function block. Use a block like this at the bottom of your file to call main():
    def main():
        doWork()
     
    if __name__ == "__main__":
        main()

    If your python script is called like this:
    python script.py
    then __name__ will be set to __main__. If you are importing your module, it won't. So if you import your module, no code will be executed, which is usually what you want.

  • Use tools like pylint, pyflakes, and pychecker: These tools are awesome as they will find simple syntax errors and make sure that you are using proper naming conventions, and adding documentation to your code.

Tags:

Fast Python Script to Get Photos off an iPhoto CD

My uncle came over with a CD of photos from his trip to Peru. I am pretty sure the CD was created with iPhoto as there was an iphoto.xml file in the root of it. Anyways the first problem I had was viewing the pictures after mounting the CD. The file sizes were all weird (too small considering the camera that was used to take them) and some file names were duplicated. Trying to open them up with kuickshow or imagemagick did not work. Then I remembered that this happened before when we got our photos from our wedding photographer (also a Mac user). The solution is to not use the "auto" filesystem type, but to use "iso9660" instead. I forgot to check what the "mount" command actually said it had mounted it as, but mounting it manually from the command line with the "-t iso9660" option fixed it (my fstab has "auto" for the cdrom filesystem type). After fixing that, the filenames of all the photos were unique and the file sizes seemed more reasonable (~2.5 MB).

The second problem was that the photos were not stored in a flat hierarchy in one single directory, there were stored in the year/month/day directory hierarchy. kuickshow, an excellent image viewer/slideshow program, did not want to handle these directories recursively. So I had to whip up a quick python script. I knew I probably could have done it in bash too. But I don't know bash and it's not as readable. Relies too much on special punctuation to do things, kind of like perl. I hate the [-e and [-n kind of stuff, and the dollar signs. So ugly. Here's the quick script:

#!/usr/bin/env python
import os
from os.path import getsize
import shutil
dir='/mnt/cdrom/'
dest='~/tempphotos'
for root, dirs, files in os.walk(dir):
    for file in files:
        fullpath=root+'/'+file
        destpath=dest+'/'+file
        if getsize(fullpath) > 200000:
            shutil.copyfile(fullpath, destpath)

It worked great. Pythonistas will not be too impressed by this 10-liner. I'm mainly just showing that for those who have never used Python or those who have only used Python a little. It makes a great bash replacement, better than Perl because you can read it again later and actually understand it. I have used for many other such tasks, most of them more complicated than this one.

The getsize(fullpath)>200000 was to filter out the thumbnails which iPhoto also included on the CD. The fact that it included thumbnails is the dumbest thing in the world by the way. If you are exporting files on a CD one would expect that the person who exported it was going to give the CD to someone else and that someone else might not have iPhoto, so the thumbnails become totally useless. In my opinion, thumbnails are something should be filed away in a hidden cache directory on the user's hard drive, and not seen anywhere else. GIMP used to put thumbs in a .xvpics folder, something else uses ThumbsDB (Windows?). Picasa (correctly) puts them somewhere where I can't see them.

The only other problem was that stupid iPhoto had duplicated some photos so they appeared twice. I realized afterwards that some folders had a thumbs directory and an originals directory, and some originals in the the directory itself. This might have been because there were modifications to the original so the originals were copied automatically to a sub-folder before being modified.

Awesome Talk by Adrian Holovaty at Vancouver Python User Group

Adrian Holovaty gave a talk to the Vancouver Python User's Group tonight. Really nice, funny presentation. His plane was late so while we waited for him we watched his talk at Snakes and Rubies. He showed us some of the new features coming in Django. The neatest thing for me was this new thing he's been working on called "databrowse". It's really awesome. Also the forms stuff is a lot nicer now in the newforms library. Someone asked about migration with Django and how to modify the schema once your app is up and running. It was neat to watch him add a column to his database and watch how he did it. It's unbelievable how many apps this guy has written and with Django it is so fast. Some of the stuff he has been doing for the Washington Post is pretty cool.

Tags:

Adrian Holovaty Talk and Django Jam in Vancouver

This Sunday, the Vancouver/Zope User's Group (maybe the Zope part should be removed? sorry guys) is having a "Django Jam", a hands-on session where you can see how to create some simple applications or perhaps here some people talk about things they have developed in Django. Unfortunately the two applications I am working on are immature right now and I don't have a laptop, but I'll be there checking out what other people have done.

Tuesday is even more exciting as Adrian Holovaty, the lead Django developer/founder is going to give a talk to the Python User's group while he is in town for another conference. If you are interested in web frameworks (especially simple ones done in a cool language) come and check it out.

Vancouver DjangoJam.

Full invite text follows:

Django is the Python-based web framework used at companies like
Google, the Washington Post and St. Joseph Media (publishers of
Canadian Life magazine).

"Vandjangojam" is a great opportunity to learn Django or learn more
about it. In addition to a quick introduction to Django, the jam will
feature a Q&A; session with the lead programmer of it.

----

Introduction to The Django Web Framework : Sunday, February 4, 2007, 1-4

We'll discuss the basics of creating applications in Django, walking
through some simple applications hands-on.

Location: Most likely Sophos Vancouver or Uniserve. RSVP to
paul@prescod.net if you intend to come and we will inform you of the
location when it is confirmed.

-----

Adrian Holovaty: The Django Web Framework: Tuesday, February 6, 2007, 7-9

Adrian will offer some thoughts about its unique features and answer
questions from the audience.

Adrian Holovaty is the lead developer of the Django Web Framework.
Adrian and his peers invented Django while working at World Online, a
highly-renowned news Web operation in Lawrence, Kansas. His team's
pioneering work on interactive journalism won numerous awards and was
described in The New York Times, NPR and IT Conversations. Currently,
Adrian is editor of editorial innovations at Washingtonpost.Newsweek
Interactive (washingtonpost.com). His job involves coming up with
ideas for site improvements and special projects, and implementing
them.

Vancouver Python User Group Talk on Python Web Frameworks (Django, Turbogears) - October 3, 2006

Vancouver's Python and Zope User Group will be having a talk on Python web frameworks, ie. Django and Turbogears at their upcoming meeting on October 3rd. I'm looking forward to learning about web frameworks in general a bit more and perhaps what differentiates them from each other and from Ruby on Rails.

Code Golf: Pascal's Triangle

I just found out about Code Golf, similar to Perl Golf (which I never cared about, because Perl is evil). So for my first stab at Code Golf I tried the easiest problem of them all, generating a depth-34 Pascal triangle in as few characters (yes lines in no longer a good enough measure here) as possible.

My code is now incredibly obfuscated, not quite as obfuscated as say, this but enough that it's pretty much impossible to tell what is going on. So far I am pretty proud of myself because I am in 6th place. 1st place is untouchable at 75, but if I can just shave off 7 characters I'll be in second place.

Review of the Vancouver Python Workshop

I attended the Vancouver Python Workshop this weekend and had a great time. What follows is a brief chronological discussion about the talks I attended:

Friday:

Guido van Rossum's Key Note presentation: Guido talked about Python 3000 and the various things he has in mind for it. He has a clear idea for what kinds of changes he want to fix, but a lot is still up in the air. Some of this is documented in PEP 3100. Here are some of the things I wrote down (that seemed important to me):

  • range won't return lists anymore, xrange will be killed. I use xrange all the time so I will have to stop! range will return an iterator so I assume one will have to do something like list(range(n)) to create a list now.
  • All strings will become unicode. Whereas now there are string objects and a unicode string is a different beast, in Python 3000 what we now call strings will just be like byte arrays and what we now call unicode strings will just be strings (because unicode will be default).
  • <> will be dropped as a not-equal-to operator.
  • exec and print will become functions. This is great because now you can easily replace occurences of the print command in your code with a custom function like debug or logger...
  • apply, filter, map, reduce will all be removed.
  • list comprehensions will no longer leak into the surrounding scope. I've noticed this recently and it will be nice when this is gone.
  • lambda will stay. Guido has been convinced to keep it in the language.
  • Replacing raise E, arg: in favour of  raise E(arg)
  • The ability to use `x` as an alternative to repr(x) will be removed
  • Guido said something about perhaps allowed function to be overloaded by definining the types of the expected arguments but I didn't catch everything he said on this and I dont' see it written in PEP 3100.

Code migration from < Python-2.4 will be achieved through tools like PyChecker or PyLint for 80% of the work and version of Python 2.x that warn about "doomed code."

Overall I really enjoyed Guido's talk and I instead of talking about it anymore, you can basically watch the entire speech here:

It is pretty much the same slides with maybe a few changes. He admitted that it was the same talk he gave at OSCON 2006 but that he improvises a lot.

Jim Hugunin's Key Note: Jim (created of Numeric, Jython, and now IronPython (.NET)) gave us all an overview of IronPythono and what it can do. It is pretty amazing. Imagine accessing all of .NET's libraries and framework without having to code in C#. Or to code up a bottleneck in C# and call it from python code. Iron Python has released a 1.0 release candidate and it is released under a BSD-style license. I was very interested in the possibility of using IronPython as a language for numerics with some C/C++/C# running the time-consuming stuff. Jim loves the CLR and actually mentioned that he would love to port NumPy to the CLR if he could get 6 months time off to work on it. It was an awesome presentation and Jim went through a lot in 45 minutes.

Saturday:

Q&A with Guido van Rossum:

The most interesting question was something along the lines of "what is your favourite GUI toolkit/wrapper for Python" and Guido's answer was Tkinter. The interesting part was when Guido said he thought GUI toolkits were on the way out and that web apps are instead the way to go. He said using the web model where half the work can be done on the server and half on the client and things can be shared over the network is the way of the future. He also said that we have good GUI frameworks for the web already.

The other thing I found out from the Q&A was that Python has an education special interest group, the Python EDU-SIG. Apparently some schools are already starting to teach Python in introductory computer science courses. I think Python would be a great course for high school students as well as for university students in many courses, in fact any course that doesn't absolutely require another language for some reason.

Jim Hugunin: more about .NET, great presentation

Bradley Lawrence (from Rapid): Great information on how to convince your boss that Python is the right tool for the job (rapid development time, easy to bring new hires up to speed, mature and stable, support (ActiveState), license not viral). He talked about their RapidData application

Tom Weir: Good presentation about SWIG; however, it had the side-effect of convincing me that I should avoid SWIG at all costs unless I really need to use it.

For the rest of the afternoon I attended two panel discussions where a learned a little more about Zope, Plone, general web services using Python, and embedding Python in C, C++, Objective C, Java, and other languages.

Sunday

Jim O'Leary's Object-Oriented Basics with Python: I expected this to be an introduction to Object-Oriented Python for programmers, instead it turned out to be an Introduction to Object-Oriented Basics. I already know OO in Python but there are some niggly details about inheritance that I am still a bit fuzzy on. Nothing against Jim though. It did seem like there were a lot of programming newbies in the room.

Wilson Fowlie: A great talk all about pyparsing. If you need to parse anything, check out this module. Sounds awesome.

James Thiele: A good talk about embedding domain-specific languages in Python. It was a great tutorial, he built up his code slowly slide-by-slide explaining exactly how do go about adding syntax to python for defining syntax for a edge for a weighted graph (such as n1->n2). He slides should be on his website soon.

Anthony Howe (from Voice Mobility: Interesting application called WebFeeds that sends you RSS feeds to your voice mail. The ascii to voice translation was done using NeoSpeech and it sounded so good that I didn't realize it was not a real voice at first.

Lightning talks: A quick talk about Django (which I am really excited about) and many, many other talks (5 minutes each). The work going on at rPath sounds interesting.

Ian Caven: Great talk showing off the movies his company has restored over the years. The bulk of all the algorithms that he coded for restoring movies is done in Python with Numeric, most likely with an old version of Numeric. They have something like 50000 lines of Python code and about the same amount of legacy C++ sharp code which I assume is called from Python.

Overall I really enjoyed the conference and I made a couple contacts. The slides of all the presenters should be on the web soon and I look forward to having a second look at some of them and a first look at others that I wasn't able to attend.

Registered for Vancouver Python Workshop

I just officially registered for the Vancouver Python Conference. I am really looking forward to it even though it will take up my entire Saturday and Sunday. It is too bad I don't have a laptop that isn't ancient, otherwise I might give a lightning talk on numpy since it is completely absent from the talk schedule. Had I thought about it earlier I would have prepared and offered to give a talk on how to use NumPy for the beginner's track of the schedule. Maybe next year. Besides with Numpy having a 1.0 release by next year, maybe they will be able to get Travis Oliphant to visit. :-) or at least someone more knowledgeable in numpy than me.

Numpy 1.0b1 released

My favourite numerics package, Numpy is close to a 1.0 release. 1.0 (beta1) was just released last Friday and a branch for 1.1 has been created. Things are really getting interesting and it is nice to finally have an almost-standard numerics package for Python (compared to before when there was numarray and numeric which both didn't have heavy development activity).

Pages

Subscribe to RSS - python