Good practices in Python
========================

:date: 2013-01-20 17:34

This post is a collection of various facts about Python:

* common mistakes that I encounter frequently when reading code written by
  myself or other people.

* specific features of the Python language that are not very well-known and that
  I think should be used more.

* general recommendations regarding Python.

Please note that I do not consider myself a Python expert, so it is possible
that the following text contains some inaccurate statements.

Also, due to its very nature, this post is rather unstructured. The table of
contents should help you jumping directly to the part you are interested in.

.. contents:: :local:

List comprehensions
-------------------

*List comprehensions* give Python users a very concise and powerful syntax to
build a list from another list (or any iterable object). The syntax is the
following:

.. code-block:: python

    result_list = [expression(item) for item in original_list if condition(item)]

which means that ``result_list`` will be a list containing ``expression(item)``
(an expression computed from ``item``) for each ``item`` element of
``original_list`` for which ``condition(item)`` (a boolean expression involving
``item``) is ``True``. The boolean condition which allows you to filer the list
is optional.

For example, to compute the list of squares of elements in a list, instead of:

.. code-block:: python

    l = [1, 2, 3]
    result = []
    for i in l:
        result.append(i*i)

which is particularly inefficient because of the repeated use of the ``append``
function, one could use a *list comprehension*:

.. code-block:: python

    l = [1, 2, 3]
    result = [i*i for i in l] # result = [1, 4, 9]

In addition to being shorter, the above code is also faster (around 3x
improvement) because you build the list in one instruction.

Another example to compute the list of square roots of all non-negative
elements of a list (you could get in big troubles computing the square root of
a negative element):

.. code-block:: python

    from math import sqrt

    l = [4, -3, 9]
    result = [sqrt(i) for i in l if i >= 0] # result = [2, 3]

The same syntax also exists for dictionaries, this is called *dict
comprehensions* (very original, isn't it?). For example, to transform a list of
(name, phone number) pairs into a dictionary, for faster lookup:

.. code-block:: python

    l = [("Barthes", "+33 6 29 64 91 12"), ("Dumbo", "+001 650 472 4243")]
    d = {name: phone for (name, phone) in l}

You can get more details about *list comprehensions* on the `dedicated section <http://docs.python.org/tutorial/datastructures.html#list-comprehensions>`__ of the official documentation.

The multiples faces of ``in``
-----------------------------

The ``in`` keyword has many different meanings and makes Python code so easy to
write that people often forget to use it.

* ``in`` gives a universal syntax to iterate over iterable objects. For
  example, to iterate over a list, instead of:

  .. code-block:: python

    l = [1, 2, 3]
    for i in range(len(l)):
        print l[i]

  you could simply write:

  .. code-block:: python

        l = [1, 2,3]
        for i in l:
            print i

  similarly, to iterate over a dictionary, instead of:

  .. code-block:: python

        d = { ... }
        for key in d.keys():
            print d[key]

  you could write:

  .. code-block:: python

        d = { ... }
        for key in d:
            print d[key]

* ``in`` also allows you to test whether an element belongs to some structure:
  list, dictionary (or any iterable object), occurrence of a substring inside
  a string. For example:

  .. code-block:: python

        l = [line for line in open("server.log") if "Connected" in line]

  will return the list of lines from the file ``server.log`` containing
  ``Connected`` as a substring.

Manipulating lists with atomic instructions
-------------------------------------------

More generally, it is advised to avoid iterating over a list with a ``for``
loop. ``for`` loops are slow in Python and writing an operation over a list as
a single instruction allows Python to optimize the execution of the code
internally.

*List comprehensions* often help in replacing an iteration by a single
instruction. Here are a few other functions which can be helpful in this
regard:

* ``join`` can be useful to format a list. For example, to print the list of
  words whose first letter is ``a`` in a list of words. Instead of:

  .. code-block:: python

    l = [ ... ]
    result = ""
    for word in l:
        if word[0] == 'a':
            result += word + " "
    print result

  you could do:

  .. code-block:: python

    l = [ ... ]
    print " ".join([word for word in l if word[0] == 'a'])

* ``sum``, to sum the elements of a list.

* ``map``, to apply a given function to all elements in a list. For example to
  reverse all the words in a list:

  .. code-block:: python

    l = ["Dumbo", "Polochon"]

    def reverse(word):
        return word[::-1]

    m = map(reverse, l) # m = ['obmuD', 'nohcoloP']

*slices* are also very useful when it comes to manipulating lists (or sublists)
in blocks. Remember that if ``l`` is a list (or any iterable)
``l[begin:end:step]`` will extract all the elements from index ``begin``
(included) to index ``end`` (excluded) with a step of ``step`` (this last
parameter being optional).

If the ``begin`` parameter is omitted, it is given 0 as default value.
Similarly, the default value of ``end`` when unspecified is ``len(l)`` (the
numbers of elements in ``l``). A negative value for ``begin`` or ``end`` will
be subtracted from the end of the list. For example, to extract all the element
but the last one:

.. code-block:: python

    l = [1, 2, 3]
    m = l[:-1] # m = [1, 2]

Using a negative value for the ``step`` parameter can be useful to walk through
an iterable object in reverse order as shown in the example given above to take
the mirror image of a word:

.. code-block:: python
    
    word = "dumbo"
    drow = word[::-1] # drow = "obmud"

which compensates for the scandalous lack of a ``reverse`` function for strings
in Python.

Exceptions
----------

Exceptions provide a powerful tool found in many high-level programming
languages which is often under-used. They allow for a less defensive
programming style by handling errors *as they appear* instead of making test
*beforehand* to prevent them from happening.

In Python, every time you are trying to execute an illegal operation (*e. g.*
trying to access an element outside a list's boundaries, dividing by zero,
etc.) instead of simply crashing the program, Python raises an exception which
can be caught, giving the programmer a last chance to fix the problem before the
program ultimately crashes.

The syntax to catch exceptions in Python is the following:

.. code-block:: python

    try:
        .... # piece of code potentially raising the exception named Kaboum
    except Kaboum:
        .... # piece of code to be executed if the above code raises Kaboum

For example, if a line of code contains a division by a number which could
seldom be equal to zero, instead of systematically checking that the number
is non zero, it is much more efficient to encapsulate the line within a ``try
... except ZeroDivisionError:`` to handle specifically the rare cases when the
number will be zero. This is the well-known principle: *better ask for
absolution than permission*.

Another example, when trying to access an unbound key in a dictionary, Python
raises the ``KeyError`` exception. This exception can be used to initialize the
value associated with the unbound key. For example, to compute a dictionary of
word counts in a text, you can often find:

.. code-block:: python

    text = "..."
    result = {}
    for word in text.split():
        if word in result:
            result[word] += 1
        else:
            result[word] = 1

You could instead use the ``KeyError`` exception to your advantage to avoid the
systematic ``if`` test:

.. code-block:: python

    test = "..."
    result = {}
    for word in text.split():
        try:
            result[word] += 1
        except KeyError:
            result[word] = 1

The difference with the previous code is that *most of the time*, this code
will behave exactly as if the body of the ``for`` loop only contained the
instruction ``result[word] += 1``. This gives a significant speedup compared to
the first code where a test was computed for each iteration of the loop.

See the `dedicated page <http://docs.python.org/tutorial/errors.html#handling-exceptions>`__ in the official documentation.

Values equivalent to ``True`` or ``False``
------------------------------------------

If ``test`` is a boolean variable (equal to ``True`` or ``False``), we know
that it is redundant to write:

.. code-block:: python

    if test == True:
        ...

instead of:

.. code-block:: python

    if test:
        ...

More generally, Python has automatic conversion rules from standard types to
booleans. This can be used to shorten the syntax in conditional tests:

* as in the vast majority of programming languages, a positive integer is
  converted to ``True`` and zero is converted to ``False``.
* a string is converted to ``False`` if and only if it is empty. For example,
  to test whether a string ``title`` is empty, you can simply write:

  .. code-block:: python

    if title:
        ...

  instead of:

  .. code-block:: python

   if len(title) > 0:
        ...

* the ``None`` value, a constant used to initialize unspecified variables, is
  converted to ``False``. To test that a variable ``var`` is not equal to
  ``None``, you can write:

  .. code-block:: python

    if not var:
        ...

  **Beware**, the above code will not allow you to distinguish the case where
  ``var`` is ``None`` from the case where ``var`` has a value which is
  converted to ``False`` by Python (for example, an empty string or list).
  You need to be careful that this is really what you are trying to test.

Generators
----------

Generators provide an easy way to create iterator objects (objects over which
you can iterate) and can be created in several ways.

Generator expressions
~~~~~~~~~~~~~~~~~~~~~

*Generators expressions* are exactly similar to *list comprehensions* except
that the brackets are replaced by parenthesis. Thus, the following code:

.. code-block:: python

  l = [1, 2, 3]
  m = (i*i for i in l)
  print '\n'.join(m)

would produce the exact same result had the second line been replaced by:

.. code-block:: python

  m = [i*i for i in l]

The difference between the two codes is that in the case where ``m`` is
defined by a *list comprehension* the list is integrally computed and stored
in memory when the variable ``m`` is defined. On the contrary, when ``m`` is
defined by a *generator expression*, the elements in ``m`` are generated on
the go *when needed*: only when trying to iterate over the variable ``m`` (as
induced by the call to the ``join`` function in the above example) are the
elements generated.

From the speed of execution point of view, both solutions are equivalent: in
the end, each element in ``m`` will be computed once and only once. From the
memory usage point of view however, generators present a clear advantage:
because the elements are generated dynamically, one at a time, never more than
one element is stored in memory at the same time. In cases when the list is too
big to fit into memory, *generators* could be the solution.

When using a ``generator expression`` as the argument of a function, Python
allows to drop one pair of parenthesis to make the code more readable. For
example, in the following code:

.. code-block:: python

  l = [1, 2, 3]
  total = sum((i*i for i in l))

the second line can be replaced by:

.. code-block:: python

  total = sum(i*i for i in l)

Generator functions
~~~~~~~~~~~~~~~~~~~

A second way to define a *generator* is by writing a function using the special
keyword ``yield``. When called, this function will return an iterable object
whose behavior is the following: on each iteration step, the function is
executed until a ``yield`` instruction is hit. The value following the
``yield`` keyword is returned and can be used during the iteration step. The
execution of the function is frozen until the next iteration step.

For example, let us define the following function:

.. code-block:: python

    def min_max(filename):
        with open(filename) as f:
            for line in f:
                l = map(int, line.split())
                yield min(l), max(l)

When called, this function will produce an iterable object. When iterating
over this object, at each iteration, one line of ``filename`` will be read,
and the minimum and maximum values of this line will be returned when the
``yield`` keyword is reached, freezing the execution of the function until
the next iteration.

Hence, the following code:

.. code-block:: python

    for (inf, sup) in min_max(filename):
        print (inf + sup)/2.

is exactly equivalent to:

.. code-block:: python

    with open(filename) as f:
        for line in f:
            l = map(int, line.split())
            inf, sup = min(l), max(l)
            print (inf + sup)/2.

but allows you to define separately the code which will generate the list of
minimum and maximum values, and the code which makes use of the generated
elements.

Built-in functions
~~~~~~~~~~~~~~~~~~

Finally, some built-in functions in Python return generator objects. This is
the case of the ``xrange`` function which behaves exactly as the ``range``
function. The difference is that ``range`` computes a list of integers whereas
``xrange`` defines a generator object generating the elements on the go, one at
a time. A call to ``range(1000000000)`` might induce a memory error on your
machine (depending on your memory capacity), but you will be fine using
``xrange``, both calls being equivalent for iteration purposes. It is almost
always more suitable to use ``xrange`` instead of ``range`` and in
Python 3.x ``xrange`` has even been renamed to ``range``.

Read more about generators on the `official documentation <http://docs.python.org/tutorial/classes.html#generators>`__.

Decorators
----------

*Decorators* provide a very powerful way to alter the behavior of a function
without redifining it. The syntax is the following:

.. code-block:: python

    @logging
    def f(x):
        return x + 1

In the above example, we say that ``f`` has been *decorated* with ``logging``.
``logging`` must be a function taking another function as an argument. The
result of this decoration is equivalent to this piece of code:

.. code-block:: python

    def f(x):
        return x + 1

    f = logging(f)

which means that by decorating ``f``  with ``logging``, ``f`` now behaves as
the composite function ``logging(f)``.

A simple decorator
~~~~~~~~~~~~~~~~~~

Imagine that we want the ``logging`` decorator to *log* the calls made to the
function it decorates, by printing them to the standard output. Such
a decorator could be written like this

.. code-block:: python

    def logging(fun):
        def aux(*args, **kwargs):
            print "Calling", fun.__name__
            fun(*args, **kwargs)
        return aux

Because ``logging`` could be used to decorate any function, with an arbitrary
number of arguments and keyword arguments, it is necessary to use the generic
syntax ``aux(*args, **kwargs)``. This syntax stores all the arguments passed to
``aux`` in a list named ``args`` and all the keyword arguments in a dictionary
named ``kwargs``. Note that the exact same arguments are passed to ``fun``,
meaning that from the argument passing perspective, ``aux`` and ``fun`` will
behave similarly. The difference being that ``aux`` logs the call to the
standard output prior to doing the computation made in ``fun``: this is how we
expected the decorator to behave.

To be perfectly rigorous, the previous decorator should have been written like
this:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        @wraps(fun)
        def aux(*args, **kwargs):
            print "Calling", fun.__name__
            fun(*args, **kwargs)
        return aux

``aux`` is now itself decorated by the ``wraps`` decorator provided by the
``functools`` module. This decorators does some magic to ensure that ``aux``
behaves as closely as possible to ``fun``. Without this decorator, the
following code:

.. code-block:: python

    @logging
    def f(x):
        return x + 1

    print f.__name__

would print ``aux`` to the standard output, instead of the expected ``f``. The
``wraps`` decorator ensures among other things that the ``__name__`` attribute
is preserved throughout a decoration.

Let us further assume that you want to extend the ``logging`` decorator to not
only log the calls, but also keep track of how many times the function has been
called. 

You could be tempted to write something like:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        a = 0
        @wraps(fun)
        def aux(*args, **kwargs):
            a = a + 1
            print "{0} has been called {1} times".format(fun.__name__, a)
            fun(*args, **kwargs)
        return aux

However, if you apply this decorator to some function and then call it, you
will get an angry face from Python complaining that the variable ``a`` is
unbound. The problem comes from this line:

.. code-block:: python

    a = a + 1

Here, Python thinks you are redefining the variable ``a`` and forgets about its
previous definition. As a consequence, when reaching the ``a + 1`` part, ``a``
is no longer defined, causing the error. This is a current limitation of Python
2: local variables that have been defined outside the current scope are read-only.

A standard way to circumvent this limitation is to use a mutable structure for
``a``: ``a`` itself cannot be redefined, but the structure it is pointed to
can. Using this, the previous example can be rewritten as:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        a = [0]  
        @wraps(fun)
        def aux(*args, **kwargs):
            a[0] = a[0] + 1
            print "{0} has been called {1} times".format(fun.__name__, a[0])
            fun(*args, **kwargs)
        return aux

where ``a`` points to a list of length 1 storing the number of calls at its
first position.

Another example
~~~~~~~~~~~~~~~

A common example which is often used to illustrate decorators in Python is
`memoization <http://en.wikipedia.org/wiki/Memoization>`__: when a function is
computation-heavy but often called using the same arguments, you can save
a lot of time by caching past results returned by the function.

This idea can be nicely implemented in Python using a decorator. The decorator
will store past results in a dictionary: when the decorated function will be
called, the decorator will perform a lookup in its dictionary to check whether
the function has already been called with the same argument. If the dictionary
already contains an entry for this argument, the associated value is returned.

Here is how you could write such a decorator for a single argument function:

.. code-block:: python

    from functools import wraps

    def memoize(fun):
        cache = {}
        @wraps(fun)
        def aux(x):
            if x in cache:
                return cache[x]
            else:
                a = fun(x)
                cache[x] = a
                return a
        return aux

Then if ``f`` is defined like this:

.. code-block:: python

    @memoize
    def f(x):
        ... # very long and heavy computation

when calling ``f`` twice with the same argument, you will incur the computation
cost only during the first call, the second call being almost instantaneous.

Classes with two methods
------------------------

Let us briefly recall how classes work in Python. A class is defined like this:

.. code-block:: python

    class Cipher:

        def __init__(self, key):
            self.key = key

        def decrypt(self, message):
            return (message & self.key)

all the methods of a class take as their first argument the instance on which
the method is being called. By convention, this first arguments is always named
``self``. If ``a`` is an instance of ``Cipher``, the instruction
``a.decrypt(message)`` is equivalent to ``decrypt(a, message)``.

The special function ``__init__`` is the class constructor and is called every
time an instance of the class is created. Its typical use is to initialize some
attributes of the instance. An instance of ``Cipher`` class can be created like
this:

.. code-block:: python

    d = Cipher(key)

A flaw commonly found in code written by people coming from object-oriented
programing languages is to create classes for everything. This often leads to
classes containing only two methods, one being the ``__init__`` function. This
is the case in the class written above as an example. By looking to this
example a bit closer, you can see that it is possible to completely get rid of
the class definition: a ``decrypt`` function taking the key as an  additional
argument is sufficient:

.. code-block:: python

    def decrypt(key, message):
        return (message & key)

Some people could object that it still makes sense to use a class in the
example above, if we plan to extend the ``Cypher`` class in the future, for
example by adding an ``encrypt`` function. In my opinion, it is better to start
by writing your code as simply as possible. If you really need to extend the
code, then you can start restructuring it and group several related functions
in a class.

PEP 8
-----

When writing about good practices in Python, it is impossible not to mention
the PEP8. It is a set of recommendations on coding style in Python.  These
recommendations are of course not absolute rules and should be taken as advice.
However, I noticed that following these recommendations generally leads to
greater code readability. Moreover, as many people who code in Python also
follow these recommendations, adopting them reduces the gap between your code
and code written by others: this will save you some time when reading code.

Here are a few points extracted from the PEP8:

* you should follow English typographic rules: no space before a colon, no
  space before a comma, but a space after, etc.

* you should put spaces around operators like the equal sign, plus sign, etc.

* you should try to limit the length of the lines of code to a maximum of 80
  characters.

More details on the `PEP8 page <http://www.python.org/dev/peps/pep-0008/>`__.