Using XML for Code Documentation is Just Plain Wrong

I was just looking at some C# code at work today and it had XML Documentation (like javadoc or python docstrings, only with XML). Who was the idiot that came up with that idea? It's the most insane thing I've ever seen. Let's look at the predecessors to C#'s XML documentation:

Javadoc:

/**
 * Returns an Image object that can then be painted on the screen. 
 * The url argument must specify an absolute {@link URL}. The name
 * argument is a specifier that is relative to the url argument. 
 * <p>
 * This method always returns immediately, whether or not the 
 * image exists. When this applet attempts to draw the image on
 * the screen, the data will be loaded. The graphics primitives 
 * that draw the image will incrementally paint on the screen. 
 *
 * @param  url  an absolute URL giving the base location of the image
 * @param  name the location of the image, relative to the url argument
 * @return      the image at the specified URL
 * @see         Image
 */
 public Image getImage(URL url, String name) {
	try {
	    return getImage(new URL(url, name));
	} catch (MalformedURLException e) {
	    return null;
	}
 }

Then, doxygen, which looks a lot like javadoc:

      /**
       * a normal member taking two arguments and returning an integer value.
       * @param a an integer argument.
       * @param s a constant character pointer.
       * @see Test()
       * @see ~Test()
       * @see testMeToo()
       * @see publicVar()
       * @return The test results
       */
       int testMe(int a,const char *s);

Unfortunately Genshi doesn't syntax highlight the javadoc comments. But it looks fairly readable. Let's try a python docstring example. There is no one standard. One of the documentation generators for Python, Epydoc understands plaintext, javadoc, epydoc, and reStructuredText.

Python code with epydoc style docstrings:

def x_intercept(m, b):
    """
    Return the x intercept of the line M{y=m*x+b}.  The X{x intercept}
    of a line is the point at which it crosses the x axis (M{y=0}).
 
    This function can be used in conjuction with L{z_transform} to
    find an arbitrary function's zeros.
 
    @type  m: number
    @param m: The slope of the line.
    @type  b: number
    @param b: The y intercept of the line.  The X{y intercept} of a
              line is the point at which it crosses the y axis (M{x=0}).
    @rtype:   number
    @return:  the x intercept of the line M{y=m*x+b}.
    """
    return -b/m

Python code with one example of reStructuredText docstrings (this one includes the types of the parameters but they aren't necessary):

def fox_speed(size, weight, age):
    """
    Return the maximum speed for a fox.
 
    :Parameters:
      size
        The size of the fox (in meters)
      weight : float
        The weight of the fox (in stones)
      age : int
        The age of the fox (in years)
    """
    #[...]

I couldn't find any nice examples for C# XML Documentation. The C# XML Documentation Tutorial has some examples, but conveniently, none that include all the tags that I would need to replicate the javadoc example I showed above. So I'll convert the Java example to C#:

   /// <summary>
   /// Returns an Image object that can then be painted on the screen. 
   /// The url argument must specify an absolute {@link URL}. The name
   /// argument is a specifier that is relative to the url argument. 
   /// 
   /// This method always returns immediately, whether or not the 
   /// image exists. When this applet attempts to draw the image on
   /// the screen, the data will be loaded. The graphics primitives 
   /// that draw the image will incrementally paint on the screen.</summary>
   /// 
   /// <param name="url">an absolute URL giving the base location of the image</param>
   /// <param name="name">the location of the image, relative to the url argument</param>
   /// <returns>
   /// the image at the specified URL</returns>
   /// <seealso cref="Image">
   /// Read more about the Image class</seealso>
 */
 public Image getImage(URL url, String name) {
	try {
	    return getImage(new URL(url, name));
	} catch (MalformedURLException e) {
	    return null;
	}
 }

I followed Microsoft's convention (because they know best) of putting the opening tags on a line on their own.

The javadoc sucks because you have to put a <p> (or <br />?) to make a new line which is stupid. Otherwise it's pretty readable, and same goes for doxygen. Especially the @param and @return tags. The Epydoc-style python docstrings suck. You have to specify the type using a @type tag and the return type using an @rtype tag. The reStructuredText example looks the best to me. No tags at all, except for the :Parameters: heading which should be there anyways. The C# comments are an eyesore. Even if Visual Studio had syntax highlighting for the comments it would suck. Did Microsoft look at the two major previous implementations (doxygen and javadoc) and decide that XML was a better way to document code?

I recently saw an interesting comment in scipy's source about one of scipy's guiding principles in designing the docstring standard for their codebase:

A guiding principle is that human readers of the text are given precedence over contorting docstrings so our tools produce nice output. Rather than sacrificing the readability of the docstrings, we have written pre-processors to assist tools like epydoc_ and sphinx_ in their task.

Microsoft clearly took the opposite route and decided to make code documentation readability by human readers a low priority.

Comments

I totally agree with you. Hope they will change this in near future. Maybe there is a plugin outthere which can make the xml crap readable in the editor ... then the workdays would be less painful ...

BTW: Captchas sucks as much as xml-comments do ... (computers can bypass them and humans are annoyed)

I've made an editor (Inventor IDE) that lets you just fill in the blanks and generates the doc comments behind the scenes. It currently spits out and reads in stuff in a Natural Docs comment format, since it's so much less ugly than Javadoc etc, but it could be easily changed to help hide the ugliness of other doc comment formats.

It'll eventually need to store something like XML to represent documentation that's more elaborate than raw text, but that's farther down the road. It currently only supports assembly language, so getting C/C++ support is more important. :D

But SPAM sucks more than Captcha.

I write C# by day and Java by night. I'm no Microsoft fanboy (in fact I won't touch a Windows machine unless I'm paid), but I'm a C# developer by profession. We follow Microsoft conventions at work, so all our documentation is completely up to standard. I write a lot of C# XML documentation.

That said, I have to say that this post absolute nonsense. Return types, parameters, exceptions, remarks, etc all have to be marked up one way or another. I can't think of a more unobtrusive way of making a remark than simple xml tags. Yes, I found it a little annoying for the first week. But it's a habit you have to learn, like writing unit tests or giving methods correct access.

The approach to documentation is to make it as general as possible. It's not designed to go into HTML MSDN-style documentation any more than it is to go into the meta-data for the .NET assembly than it is to go into intellisense in the Visual Studio editor to give you hints when coding. It's a general-purpose as un-coupled and reusable as possible, a sensible software principle.

Documentation is as much software as code. It should follow the same formal language, it should compile with the same lack of ambiguity and it should be verified as correct in the same way.

I dislike Microsoft as much as the next person, but for reasons such as illegal and aggressive behaviour by certain parts of the organisation. Their language design is actually pretty good.

How much of this post was reason and how much was it anti-MS hate?

I can't think of a more unobtrusive way of making a remark than simple xml tags.

I can. So when you were thinking of "unobstrusive ways of making a remark" did you consider "@param foo remark"? I guess not, which is a bit surprising since you say you code in Java "by night." Even after reading my post, you are going to argue that <param name="foo">Description of foo goes here</param> is less obtrusive than @param foo description of foo goes here? I'd like to know what your definition of obtrusive is.

The approach to documentation is to make it as general as possible. It's not designed to go into HTML MSDN-style documentation any more than it is to go into the meta-data for the .NET assembly than it is to go into intellisense in the Visual Studio editor to give you hints when coding. It's a general-purpose as un-coupled and reusable as possible, a sensible software principle.

XML is no more general or usable for different things than anything else. The docs at a minimum need to be parseable and good docs are also readable by humans. XML is parseable but not as easy for humans to read. So why did Microsoft use XML? a) they were lazy and didn't want to write their own parser and b) Microsoft has not-invented here syndrome and didn't want to use/endorse any existing solutions like doxygen or leave it up to the community and 3rd party tools as in Python's case.

Documentation is as much software as code. It should follow the same formal language

I don't even know how to address this as it makes absolutely no sense.

How much of this post was reason and how much was it anti-MS hate?

Read my post again. I covered 4 different documentation styles and commented on all of them. I tried to be as fair as possible. When you look at the XML comments next to the other styles, there just aren't any positive things I can say about the XML comments.

I don't even know how to address this as it makes absolutely no sense.

Couldn't have said it any better myself! :)

Hi,
first of all, I'd like to thank you for the nice presentation of several code documentation styles in different languages. As for the C# Xml comments, I feel like disagreeing with you about legibility, which is why I suppose this is a matter of personal preference.

To begin with, let me state that I consider it most important that certain languages define their own proper way of documentation - Java with its standard tool Javadoc, in C# the documentation syntax being more or less a part of the language definition - because this ensures that classes taken over from separate projects will not feature clashing documentation markup styles.

When I started using Java, I liked the fact that Javadoc defined a general documentation syntax, yet I never liked the syntax itself. It seemed to me somewhat sloppy, lacking a consistent concept (e.g.: Why do I have to put curly braces around inline tags such as {@link ...}, but not around @param ...? If @param a b c d is the tag @param with two parameters: 'a' and 'b c d', how would an optional second parameter be expressed?). Also, I was bugged by poor visibility of the @ characters amongst loads of other text, which makes the whole Javadoc comments look like an unstructured pile of words, IMHO. (Eclipse's highlighting of Javadoc tags can help that a bit, though.)

When I later got in touch with C#, I was actually very relieved about the Xml based documentation syntax - it was a syntax I already knew and was used to reading, as opposed to the custom-built JavaDoc markup. I somehow find myself to be able to see the characteristical angled brackets around the Xml tags much more easily than the @ characters; also, those Xml elements are structured in a comprehensible way throughout the whole comment due to featuring either an opening and a closing tag or a single tag with the slash at the end, and with the parameters to the elements clearly discernible from normal documentation text.

That nonwithstanding, I do have some qualms about the actually available set of documentation tags in C#, as they don't provide support for information like the author of a class. While it is true that anyone can add their own elements to the documentation comments, as long as there is no way to simply tell Sandcastle or the respective standard documentation-Xml to pretty documentation converter to interpret and output a few additional elements (without creating your own fully-fledged XSLT, that is), this seems somewhat worthless to me.
I can see that the elements which are there were maybe inspired by DocBook, but I seriously hope that list will be extended in the future.

One thing I didn't see mentioned is that new programmers who start programming in .NET/C# will quickly learn XML one way or another. Introducing them to XML Code Documentation is very easy.

@param url an absolute URL giving the base location of the image

may be more readable to you and me, but for people that aren't used to it, the format of:

<param name="url">an absolute URL giving the base location of the image</param>

is much easier to grasp. Because of XML being used so widely throughout .NET, it makes a good candidate for auto-documentation.

I will admit that, at first, writing XML Documentation-style comments in program source is very trying. I started writing (whatever)Doc years ago when I started writing reusable PHP components. The standard there is JavaDoc. Most PHP IDEs allow for syntax highlighting of tags and pre-formatting of JavaDoc comments (as well as pre-formatted templates for).

There is one caveat in most JavaDoc-like formats: code coloring. It is true that the code coloring in (most) IDEs color the tags (such as @see and @type), but that's it. There's no code coloring or syntax highlighting disambiguity between type literals (such as String or Int) and plain, ol' description text. In the case of HTML/XML tags in such documentation standards - such as <code>, <strong>, and <em> - the text is rendered the same as description text and types. In other words, other than the "@" keywords, all text is rendered in the same color (usually the syntax highlighting option for "comments"). That, in my mind, demotes readibility.

Taking that into consideration, Microsoft Visual Studio highlights not only the C# XML Documentation tags, but the delimiters (///), attribute values (such as cref="SomeReference.Method()"), and description text in different colors. In that sense, the more widely searched for parts (such as return types) stand out from the code. It is also easier to scan through the code and figure out what is documentation and what is simple, one-line descriptions. The more important, usable information (as I'm certain you'll agree) is found in the documentation, not the one-liners.

I program in a vast variety of different languages. PHP, C#, and JavaScript are my most often used languages. (Don't ask how.) I've written rather extensive code documentation in all three languages and I have to say that although I am most familiar with PHP and documenting it (via JavaDoc standards), I find it much simpler to review documentation in the C# XML Documentation standard, especially when programmin in Microsoft Visual Studio.

In the end, however, this topic is seriously more a matter of personal preference. If you are more accustomed to other forms of code documentation ... USE IT.

Your code should be readable by itself and documentation should be hided in the code but doc should be easily shown and edit in an docking IDE Panel.

In c# a tool could use partial class to save a .xml doc file and still keep intelissence when you write code using the class.

No need to put thousand of stupid caracters ///*** when reading/write code...

Even if you can hide it on one line it's too much...

and too much comments in code is actually a code smell saying your code is unreadable...

also, In a way we write object oriented codes with procedural oriented IDE

Agreed.
Comments and Code should not be mixed together. Code should be readable and clear by itself.
Its time for IDEs to evolve and let to write documentation in a *related* file. And by "documentation" I mean useful stuff not just an echoing of what the code says.
Given that basis, XML, javadoc or doxygen, all them sucks. In the meanwhile I think is preferable not to use any of them.

Just put comments when you cant make your code more readable. If I'm the next programmer in your team I'll read your code, not your comments

It kills me how many counter-arguments are held up by MS Studio. Sure if you have the IDE designed for the wack-o commenting requirements, it'll be great. But as soon as you move to something cross platform like Eclipse, you have to read XML as inline documentation. Ridiculous.

I totally agree with your sentiment in this post. If I'm trying to understand code, don't make me read MORE code just to get to the description. I want to read english commentary, not more syntax. There's a reason they're called "comments".

And english is easier to read than markup!!!! If it wasn't, then we'd all be reading HTML instead of web pages!

And please give me a break with the "you get used to it" crap. I prefer to drive a standard transmission, but that doesn't mean it's easier and that everyone should do it. The vast majority of cars are auto, just like the vast majority of comments are readable by humans.

Doxygen is better than Javadoc, because you DON'T have to litter it with all over the place, and the markup you DO use is generally very terse, the complete OPPOSITE of XML. It even does smart things like automatically translate successive lines starting with '-' into bullets. Doxygen should be the standard the others aspire to.

Thanks again.

I totally agree.

XML is kind of abused today because people consider it a catch-all format which is supposed to be both human-readable and human-writable. It's not! It contains redundancy, it's cumbersome to read and to type.

The sheer existence of Javadoc, Doxygen or even Markdown or WikiText are proofs that XML-based formats are for machines, not for humans.

Add new comment