Friday, October 26, 2007

Why .NET Programmers Should Care About Python

OK, so you're a hotshot C# programmer (or VB, JScript, etc). Your skills are in demand, and you have a metric ton of tools at your disposal. XML, database, GUI, web, networking, threading, graphics, it's all there, and the documentation is pretty decent too, so you don't have to spend an eternity figuring out how to use it. Even more esoteric stuff isn't so hard for you. Well, maybe you have to do some extra reading but you're pretty sure you can tackle game programming, message-passing concurrency, handwriting recognition, OCR, speech recognition, and even robotics. So life is good.

But you have this one coworker who just won't shut up about this weird little language he's always using. He says that when he programs, he doesn't use an IDE, he doesn't declare types, he doesn't use curly braces to delimit blocks of code, and he never compiles anything. "WTF?" you say. "Who would want to use a language like that?" Your coworker gets a sneaky little smile on his face as if that's just what he was waiting to hear. He pulls out a crinkly piece of paper with a list of names written on it and hands it to you. You recognize every name on that paper, and it's not because they're people you know. They're the names of companies that everyone knows.

Google. Microsoft. VMWare. Nokia. HP. Cisco. Sony Imageworks. Canonical. Philips. Honeywell. And the list just goes on and on and on. "What language did you say you're using?" you ask. Your coworker stands up like a bolt, throws off his sweater vest, rips off his Simpsons T-shirt in a bad imitation of Hulk Hogan, and proceeds to point triumphantly with both index fingers at a single word tattooed on his pasty, hairless chest. It says: Python. Under that there's a teeny little cartoon snake and under that some English guy with a mustache. And on his navel you see...

OK, enough with the story. It's getting really weird anyway. The point is, companies around the world are using Python everyday to make their products and deliver value to their customers. That means that smart people, people like you, are using Python and doing amazing things with it. And they are doing these amazing things much more efficiently than you might suspect...

OK, stop with the marketing talk and just show some code already. The following code is supposed to get customer information from an XML file and print the names and emails of customers who joined in 2006 or later. A snippet of that XML file might look like this:

<customer>
    <givenName>Greg</givenName>
    <familyName>Rucka</familyName>
    <contact email="agent001@queenandcountry.com"
             phone="986.445.1200" />
    <memberSince>1994-01-03</memberSince>
</customer>

We want some output that looks like this:

Ang Lee <director@lustcaution.net>
Hayao Miyazaki <porco@ghibli.co.jp>
Joe Armstrong <joe@erlang.org>

The C# code to handle this task would look something like:

using System;
using System.Xml.XPath;

public class Task
{
    public static void Main()
    {
        XPathDocument xpd = new XPathDocument("customers.xml");
        XPathNavigator nav = xpd.CreateNavigator();

        foreach (XPathNavigator customer in nav.Select("//customer")) {
            DateTime memberSince = DateTime.Parse(
              customer.SelectSingleNode("memberSince").Value);

            string givenName = customer.SelectSingleNode("givenName").Value;
            string familyName = customer.SelectSingleNode("familyName").Value;
            string email = customer.SelectSingleNode("contact/@email").Value;

            if (memberSince.Year >= 2006) {
                Console.WriteLine("{0} {1} <{2}>", givenName, familyName,
                    email);
            }
        }
    }
}

The equivalent Python code would be:

import clr
from System import DateTime
import amara

doc = amara.parse('customers.xml')

for customer in doc.xml_xpath('//customer'):
    print type(node)
    memberSince = DateTime.Parse(str(customer.memberSince))

    if memberSince.Year >= 2006:
        print "%s %s <%s>" % (customer.givenName, customer.familyName,
                              customer.contact['email'])

Clearly, the Python code is shorter and more understandable ;-) If you don't believe me, take a look at the the two code samples side by side. We'll go over the code line by line so you understand what's going on, but first I want to mention that the Python code doesn't need to be compiled. If you are using a Python IDE (like IDLE), you can execute the script and get your result right away!

The first three lines import the libraries and classes that we need. You can tell that Python's import statement is roughly equivalent to .NET's using statement. Except in Python we import modules, not namespaces (the difference will be explained below).

import clr

Import the clr module, which gives us access to the .NET classes.

from System import DateTime

Import the DateTime class from the System module. This is the same .NET DateTime class you know and love.

import amara

Import the amara module, which contains classes and functions for handling XML. Note that this statement does NOT import everything under the amara module, it just imports the amara module itself. Also, the amara module is not included with Python, it is actually a third party module that you can download here. Amara is similar to other DOM-based XML libraries, except much easier to use than most.

doc = amara.parse('customers.xml')

This line creates an object named <code>doc</code> by calling the <code>parse()</code> function in the amara module. The big difference between modules in Python and namespaces in C# is that modules are objects, and can contain attributes and functions just like any other object.

for customer in doc.xml_xpath('//customer'):

This is a loop using the for keyword, which is similar to C#'s foreach keyword. The doc.xml_xpath('//customer') expression returns all the customer nodes inside doc. Each customer node will be bound to the variable customer.

memberSince = DateTime.Parse(str(customer.memberSince))

The str() function converts any object to a string. The expression customer.memberSince refers to the customers/customer/memberSince node in the DOM. Finally, we call the DateTime.Parse() method we know and love from .NET.

if memberSince.Year >= 2006:

This is a just a simple conditional that checks if the Year property of memberSince is greater than or equal to 2006.

print "%s %s <%s>" % (customer.givenName,
                      customer.familyName,
                      customer.contact['email'])

The print statement is very similar to C's printf() function. The interpolation syntax is identical, and the usage only differs in that the values to be interpolated are listed after the % symbol. The customer.contact['email'] expression refers to the customers/customer/contact/@email node in the DOM.

From that one example, you know almost all you possibly need to know to make effective use of amara. Python has great libraries of its own, which are worth learning because of their simplicity and flexibility. But even if you're too busy to learn them, it doesn't matter because Python allows you to leverage all of your .NET API knowledge. Here's the same example in Python, but using only .NET libraries:

import clr
from System import DateTime
from System.Xml.XPath import XPathDocument

nav = XPathDocument('customers.xml').CreateNavigator()

for customer in nav.Select('//customer'):
    memberSince = DateTime.Parse(
        node.SelectSingleNode('memberSince').Value)

    if memberSince.Year >= 2006:
        print "%s %s <%s>" % (
            customer.SelectSingleNode('givenName').Value,
            customer.SelectSingleNode('familyName').Value,
            customer.SelectSingleNode('contact/@email').Value)

So if you've stayed with me all the way then you might have these questions in mind:

  • Python looks interesting, but will it really make my programming easier?
  • How do I get started learning Python syntax?
  • How do I get started learning how to use Python with .NET?

All these questions will be answered in time, my friend. But this blog post has gotten rather long, and I need to go to bed.

.. entry_id:: tag:blogger.com,1999:blog-5424252364534723300.post-3233371896959427476

Why .NET Programmers Should Care About Python
=============================================
.. labels:: python, dotnet

OK, so you're a hotshot C# programmer (or VB, JScript, etc). Your skills are in demand, and you have a metric ton of tools at your disposal. XML, database, GUI, web, networking, threading, graphics, it's all there, and the documentation is pretty decent too, so you don't have to spend an eternity figuring out how to use it. Even more esoteric stuff isn't so hard for you. Well, maybe you have to do some extra reading but you're pretty sure you can tackle `game programming`_, `message-passing concurrency`_, `handwriting recognition`_, OCR_, `speech recognition`_, and even robotics_. So life is good.

.. _game programming: http://en.wikipedia.org/wiki/Microsoft_XNA
.. _message-passing concurrency: http://en.wikipedia.org/wiki/Concurrency_and_Coordination_Runtime
.. _handwriting recognition: http://www.codeproject.com/mobilepc/StrokeViewer.asp
.. _OCR: http://www.codeproject.com/office/modi.asp
.. _speech recognition: http://en.wikipedia.org/wiki/Speech_Application_Programming_Interface
.. _robotics: http://en.wikipedia.org/wiki/Robotics_Studio

But you have this one coworker who just won't shut up about this weird little language he's always using. He says that when he programs, he doesn't use an IDE, he doesn't declare types, he doesn't use curly braces to delimit blocks of code, and he never compiles anything. &quot;WTF?&quot; you say. &quot;Who would want to use a language like that?&quot; Your coworker gets a sneaky little smile on his face as if that's just what he was waiting to hear. He pulls out a crinkly piece of paper with a list of names written on it and hands it to you. You recognize every name on that paper, and it's not because they're people you know. They're the names of companies that everyone knows. 

Google. Microsoft. VMWare. Nokia. HP. Cisco. Sony Imageworks. Canonical. Philips. Honeywell. And the list just goes on and on and on. "What language did you say you're using?" you ask. Your coworker stands up like a bolt, throws off his sweater vest, rips off his Simpsons T-shirt in a bad imitation of Hulk Hogan, and proceeds to point triumphantly with both index fingers at a single word tattooed on his pasty, hairless chest. It says: Python. Under that there's a teeny little cartoon snake and under that some English guy with a mustache. And on his navel you see...

OK, enough with the story. It's getting really weird anyway. The point is, companies around the world are using Python everyday to make their products and deliver value to their customers. That means that smart people, people like you, are using Python and doing amazing things with it. And they are doing these amazing things much more efficiently than you might suspect...

OK, stop with the marketing talk and just show some code already. The following code is supposed to get customer information from an XML file and print the names and emails of customers who joined in 2006 or later. A snippet of that XML file might look like this:

.. code:: xml

    <customer>
        <givenName>Greg</givenName>
        <familyName>Rucka</familyName>
        <contact email="agent001@queenandcountry.com"
                 phone="986.445.1200" />
        <memberSince>1994-01-03</memberSince>
    </customer>

We want some output that looks like this::

    Ang Lee <director@lustcaution.net>
    Hayao Miyazaki <porco@ghibli.co.jp>
    Joe Armstrong <joe@erlang.org>

The C# code to handle this task would look something like:

.. code:: c#

    using System;
    using System.Xml.XPath;

    public class Task
    {
        public static void Main()
        {
            XPathDocument xpd = new XPathDocument("customers.xml");
            XPathNavigator nav = xpd.CreateNavigator();

            foreach (XPathNavigator customer in nav.Select("//customer")) {
                DateTime memberSince = DateTime.Parse(
                  customer.SelectSingleNode("memberSince").Value);

                string givenName = customer.SelectSingleNode("givenName").Value;
                string familyName = customer.SelectSingleNode("familyName").Value;
                string email = customer.SelectSingleNode("contact/@email").Value;

                if (memberSince.Year >= 2006) {
                    Console.WriteLine("{0} {1} <{2}>", givenName, familyName,
                        email);
                }
            }
        }
    }

The equivalent Python code would be:

.. code:: python

    import clr
    from System import DateTime
    import amara

    doc = amara.parse('customers.xml')

    for customer in doc.xml_xpath('//customer'):
        print type(node)
        memberSince = DateTime.Parse(str(customer.memberSince))

        if memberSince.Year >= 2006:
            print "%s %s <%s>" % (customer.givenName, customer.familyName,
                                  customer.contact['email'])


Clearly, the Python code is shorter and more understandable ;-) If you don't believe me, take a look at the the two code samples side by side. We'll go over the code line by line so you understand what's going on, but first I want to mention that the Python code doesn't need to be compiled. If you are using a Python IDE (like IDLE_), you can execute the script and get your result right away!

.. _side by side: http://feihong.hsu.googlepages.com/CSharpVsPython.html
.. _IDLE: http://en.wikipedia.org/wiki/IDLE_%28Python%29

The first three lines import the libraries and classes that we need. You can tell that Python's ``import`` statement is roughly equivalent to .NET's ``using`` statement. Except in Python we import modules, not namespaces (the difference will be explained below). 

.. code:: python

    import clr
    
Import the clr module, which gives us access to the .NET classes.

.. code:: python 
    
    from System import DateTime
    
Import the DateTime class from the System module. This is the same .NET DateTime class you know and love.

.. code:: python
    
    import amara

Import the amara module, which contains classes and functions for handling XML. Note that this statement does NOT import everything under the amara module, it just imports the amara module itself. Also, the amara module is not included with Python, it is actually a third party module that you can download here_. Amara is similar to other DOM-based XML libraries, except much easier to use than most.

.. _here: http://uche.ogbuji.net/tech/4suite/amara/

.. code:: python

    doc = amara.parse('customers.xml')

This line creates an object named <code>doc</code> by calling the <code>parse()</code> function in the amara module. The big difference between modules in Python and namespaces in C# is that modules are objects, and can contain attributes and functions just like any other object.

.. code:: python

    for customer in doc.xml_xpath('//customer'):

This is a loop using the ``for`` keyword, which is similar to C#'s ``foreach`` keyword. The ``doc.xml_xpath('//customer')`` expression returns all the customer nodes inside ``doc``. Each customer node will be bound to the variable ``customer``.

.. code:: python

    memberSince = DateTime.Parse(str(customer.memberSince))
    
The ``str()`` function converts any object to a string. The expression ``customer.memberSince`` refers to the ``customers/customer/memberSince`` node in the DOM. Finally, we call the ``DateTime.Parse()`` method we know and love from .NET.

.. code:: python

    if memberSince.Year >= 2006:

This is a just a simple conditional that checks if the ``Year`` property of ``memberSince`` is greater than or equal to 2006.

.. code:: python

    print "%s %s <%s>" % (customer.givenName, 
                          customer.familyName,
                          customer.contact['email'])
                          
The ``print`` statement is very similar to C's ``printf()`` function. The interpolation syntax is identical, and the usage only differs in that the values to be interpolated are listed after the ``%`` symbol. The ``customer.contact['email']`` expression refers to the ``customers/customer/contact/@email`` node in the DOM.


From that one example, you know almost all you possibly need to know to make effective use of amara.  Python has great libraries of its own, which are worth learning because of their simplicity and flexibility. But even if you're too busy to learn them, it doesn't matter because Python allows you to leverage all of your .NET API knowledge. Here's the same example in Python, but using only .NET libraries:

.. code:: python

    import clr
    from System import DateTime
    from System.Xml.XPath import XPathDocument

    nav = XPathDocument('customers.xml').CreateNavigator()

    for customer in nav.Select('//customer'):
        memberSince = DateTime.Parse(
            node.SelectSingleNode('memberSince').Value)

        if memberSince.Year >= 2006:
            print "%s %s <%s>" % (
                customer.SelectSingleNode('givenName').Value,
                customer.SelectSingleNode('familyName').Value,
                customer.SelectSingleNode('contact/@email').Value)

So if you've stayed with me all the way then you might have these questions in mind:

- Python looks interesting, but will it really make my programming easier?
- How do I get started learning Python syntax?
- How do I get started learning how to use Python with .NET?

All these questions will be answered in time, my friend. But this blog post has gotten rather long, and I need to go to bed.

Monday, March 12, 2007

Review of OpenOffice Impress

The first talk I ever gave at work was done in S5. I like S5 quite a bit; it produces very nice HTML slides. However when showing the slides on a projector the fonts sometimes look too small and run off the edge of the screen if you try to make them bigger. When I attended PyCon earlier this year I noticed that a number of presenters had similar problems with S5. So while I still think S5 is nice, I'd like to find a better tool because I care about how the slides look on the projector.

When I decided to do a Unicode talk at ChiPy, I went looking for alternatives. I ruled out PowerPoint pretty early on because it doesn't generate HTML slides that look good in Firefox (this may have changed in newer versions, but I refuse to upgrade just for that one feature). I also tried to use Bruce, but I couldn't get up and running in a short amount of time.

So I ended up going with OpenOffice Impress. Unfortunately, I came away pretty disappointed with my decision. First of all, there is a debilitating bug in Impress that disables the Text Formatting toolbar when a lot of text is entered into a text area. There are workarounds that you can use to get the Text Formatting toolbar enabled again, but it caused me to waste a lot of time.

When it came time to publish my slides on the web, I was underwhelmed by the export features:

  • Exporting to HTML causes every slide to be turned into an image, and fails to render the embedded tables (created using OpenOffice Calc).
  • Exporting to PDF was mostly OK, except that if you choose the "Export notes" option you'll essentially generate two pages for each slide -- one without notes, one with notes.
  • The Flash (SWF) version looks fine but has no navigation buttons. Also, I don't think you can resize the text in it.

Although my overall experience was not good, I may use Impress again in the future. After all, the slides did look decent on the projector, and it was easy to embed Calc tables into the presentation. But if I can find something better I'll definitely be using that instead.

Unicode Talk

On March 8, 2007 I gave a presentation on Unicode to the Chicago Python Users Group. Unlike most talks on Unicode, mine was geared for small children.

Anyway, here are the downloads for the talk in various formats:

  • OpenOffice Impress (this is the best version to look at, if you have OpenOffice installed)
  • PDF (my notes are embedded into the PDF, but you have to scroll to the end to see them)
  • HTML (warning: the "horse vs unicode" and "ISO8859 vs unicode" tables don't show up)

Also, here are the demos associated with the talk. I didn't have time to show any of them, but hopefully the comments inside the source files are pretty understandable.