Add first article

author: Thibaut Horel <thibaut.horel@gmail.com> 2013-01-21 12:21:38 +0100
committer: Thibaut Horel <thibaut.horel@gmail.com> 2013-01-21 12:21:38 +0100
commit: 473c850e417d00e66872d923dc8718c533ff1bf8 (patch)
tree: 034132f1ed8cb662773f822755bef45e1f61a410 /content/python-best-practices.rst
parent: 5c4e85fe93ec3a713cf173db34e422072254d1c7 (diff)
download: blog-473c850e417d00e66872d923dc8718c533ff1bf8.tar.gz
1 files changed, 700 insertions, 0 deletions
diff --git a/content/python-best-practices.rst b/content/python-best-practices.rst
new file mode 100644
index 0000000..fd8b7c3
--- /dev/null
+++ b/content/python-best-practices.rst
@@ -0,0 +1,700 @@
+Good practices in Python
+========================
+
+:date: 2013-01-20 17:34
+
+This post is a collection of various facts about Python:
+
+* common mistakes that I encounter frequently when reading code written by
+  myself or other people.
+
+* specific aspects of the Python language that are not very well-known and that
+  I think should be used more.
+
+* general recommendations regarding Python.
+
+Please note that I do not consider myself a Python expert, and it is possible
+that the following text contains inaccurate statements.
+
+Also, due to its very nature, this post is rather unstructured. The table of
+contents should help you jumping directly to the part you are interested in.
+
+.. contents:: :local:
+
+List comprehensions
+-------------------
+
+*List comprehensions* give Python users a very concise and powerful syntax to
+build a list from another list (or any iterable object). The syntax is the
+following:
+
+.. code-block:: python
+
+    result_list = [expression(item) for item in original_list if condition(item)]
+
+which means that ``result_list`` will be a list containing ``expression(item)``
+(any expression that you can computed from ``item``) where ``item`` is an
+element of ``original_list`` for which ``condition(item)`` (a boolean
+expression involving ``item``) is ``True``. The boolean condition which allows
+you to filer the list is optional.
+
+For example, to compute the list of squares of elements in a list, instead of:
+
+.. code-block:: python
+
+    l = [1, 2, 3]
+    result = []
+    for i in l:
+        result.append(i*i)
+
+which is particularly inefficient because of the repeated use of the ``append``
+function, one could use a *list comprehension*:
+
+.. code-block:: python
+
+    l = [1, 2, 3]
+    result = [i*i for i in l] # result = [1, 4, 9]
+
+In addition to being shorter, the above code is also faster (around 3x
+improvement) because you build the list in one instruction.
+
+Another example to compute the list of square roots of all non-negative
+elements of a list (you could get in big troubles computing the square root of
+a negative element):
+
+.. code-block:: python
+
+    from math import sqrt
+
+    l = [4, -3, 9]
+    result = [sqrt(i) for i in l if i>=0] # result = [2, 3]
+
+The same syntax also exists for dictionaries, this is called *dict
+comprehensions* (very original, isn't it?). For example, to transform a list of
+(name, phone number) pairs into a dictionary, for faster lookup:
+
+.. code-block:: python
+
+    l = [("Barthes", "+33 6 29 64 91 12"), ("Dumbo", "+001 650 472 4243")]
+    d = {name: phone for (name, phone) in l}
+
+You can get more details about *list comprehensions* on the `dedicated section <http://docs.python.org/tutorial/datastructures.html#list-comprehensions>`__ of the official documentation.
+
+The multiples faces of ``in``
+-----------------------------
+
+The ``in`` keyword has many different meanings and makes Python code so easy to
+write that people often forget to use it.
+
+* ``in`` gives a universal syntax to iterate over iterable objects. For
+  example, to iterate over a list, instead of:
+
+  .. code-block:: python
+
+    l = [1, 2, 3]
+    for i in range(len(l)):
+        print l[i]
+
+  you could simply write:
+
+  .. code-block:: python
+
+        l = [1, 2,3]
+        for i in l:
+            print i
+
+  similarly, to iterate over a dictionary, instead of:
+
+  .. code-block:: python
+
+        d = { ... }
+        for key in d.keys():
+            print d[key]
+
+  you could write:
+
+  .. code-block:: python
+
+        d = { ... }
+        for key in d:
+            print d[key]
+
+* ``in`` also allows you to test whether an element belongs to some structure:
+  list, dictionary (or any iterable object), occurrence of a substring inside
+  a string. For example:
+
+  .. code-block:: python
+
+        l = [line for line in open("server.log") if "Connected" in line]
+
+  will return the list of lines from the file ``server.log`` containing
+  ``Connected`` as a substring.
+
+Manipulating lists with atomic instructions
+-------------------------------------------
+
+More generally, it is advised to avoid iterating over a list with a ``for``
+loop. ``for`` loops are slow in Python and writing an operation over a list as
+a single instruction allows Python to optimize the execution of the code
+internally.
+
+*List comprehensions* often help in replacing an iteration by a single
+instructions. Here are a few other functions which can be helpful in this
+regard:
+
+* ``join`` can be useful to format a list. For example, to print the list of
+  words whose first letter is ``a`` in a list of words. Instead of:
+
+  .. code-block:: python
+
+    l = [ ... ]
+    result = ""
+    for word in l:
+        if word[0] == 'a':
+            result += word + " "
+    print result
+
+  you could do:
+
+  .. code-block:: python
+
+    l = [ ... ]
+    print " ".join([word for word in l if word[0] == 'a'])
+
+* ``sum``. To sum the elements of a list.
+
+* ``map`` to apply a given function to all elements in a list. For example to
+  reverse all the words in a list:
+
+  .. code-block:: python
+
+    l = ["Dumbo", "Polochon"]
+
+    def reverse(word):
+        return word[::-1]
+
+    m = map(reverse, l) # m = ['obmuD', 'nohcoloP']
+
+*slices* are also very useful when it comes to manipulating lists (or sublists)
+in blocks. Remember that if ``l`` is a list (or any iterable)
+``l[begin:end:step]`` will extract all the elements from the index ``begin``
+(included) to the index ``end`` (excluded) with a step of ``step`` (this last
+parameter being optional).
+
+If the ``begin`` parameter is omitted, it is given 0 as a default value.
+Similarly, the default value of ``end`` when unspecified is ``len(l)`` (the
+numbers of elements in ``l``). A negative value for ``begin`` or ``end`` will
+be subtracted from the end of the list. For example, to extract all the element
+from a list but the last one:
+
+.. code-block:: python
+
+    l = [1, 2, 3]
+    m = l[:-1] # m = [1, 2]
+
+Using a negative value for the ``step`` parameter can be useful to walk through
+an iterable object in reverse order as shown in the example given above to take
+the mirror image of a word:
+
+.. code-block:: python
+    
+    word = "dumbo"
+    drow = word[::-1] # drow = "obmud"
+
+which compensates for the scandalous lack of a ``reverse`` function for strings
+in Python.
+
+Exceptions
+----------
+
+Exceptions provide a powerful tool found in many high-level programming
+languages, and which are often under-used. They allow for a less defensive
+programming style by handling errors *as they appears* instead of making test
+*beforehand* to prevent them from happening.
+
+In Python, every time you are trying to execute an illegal operation (*e. g.*
+trying to access an element outside a list's boundaries, dividing by zero,
+etc.) instead of simply crashing the program, Python raises an exception which
+can be caught, which gives the programmer a last chance of fixing the problem
+before the program ultimately crashes.
+
+The syntax to catch exceptions in Python is the following:
+
+.. code-block:: python
+
+    try:
+        .... # piece of code potentially raising the exception named Kaboum
+    except:
+        .... # piece of code to be executed is the above code raises the Kaboum exception
+
+For example, if a line of code contains a division by a number which could
+(rarely) be equal to zero, instead of systematically checking that the number
+is non zero, it is much more efficient to encapsulate the line within a ``try
+... except ZeroDivisionErro:`` to handle specifically the rare cases where the
+number will be zero. This is the well known principle: *better ask for
+absolution than permission*.
+
+Another example, when trying to access a key which does not exist in
+a dictionary, Python raises the ``KeyError`` exception. This exception can be
+used to initialize the value associated with a key which does not exist yet in
+the dictionary. For example, to compute a dictionary of word counts in a text,
+you can often find:
+
+.. code-block:: python
+
+    text = "..."
+    result = {}
+    for word in text.split():
+        if word in result:
+            result[word] += 1
+        else:
+            result[word] = 1
+
+You could instead use the ``KeyError`` exception to your advantage to avoid the
+systematic ``if`` test:
+
+.. code-block:: python
+
+    test = "..."
+    result = {}
+    for word in text.split():
+        try:
+            result[word] += 1
+        except KeyError:
+            result[word] = 1
+
+The difference with the previous code is that *most of the time*, this code
+will behave exactly as if the body of the ``for`` loop only contained the
+instruction ``result[word] += 1`` which is a significant speedup compared to
+the first code where a test was computed for each iteration of the loop.
+
+See the `dedicated page <http://docs.python.org/tutorial/errors.html#handling-exceptions>`__ in the official documentation.
+
+Values equivalent to ``True`` or ``False``
+------------------------------------------
+
+If ``test`` is a boolean variable (equal to ``True`` or ``False``), we known
+that it is redundant to write:
+
+.. code-block:: python
+
+    if test == True:
+        ...
+
+instead of:
+
+.. code-block:: python
+
+    if test:
+        ...
+
+More generally, Python has automatic conversion rules from standard types to
+booleans to allow a shorter syntax in conditional tests:
+
+* as in the vast majority of programming languages, a positive integer is
+  converted to ``True`` and zero is converted to ``False``
+* an string is converted to ``False`` if and only if it is empty. For example,
+  to test whether or not a string ``title`` is empty, you can simply write:
+
+  .. code-block:: python
+
+    if title:
+        ...
+
+  instead of:
+
+  .. code-block:: python
+
+   if len(title) > 0:
+        ...
+
+* the ``None`` value, which is a constant used when a variable has not been
+  specified is converted to ``False``. To test that a variable ``var`` is not
+  equal to ``None``, you could write:
+
+  .. code-block:: python
+
+    if not var:
+        ...
+
+  **Beware**, the above code will not allow you to distinguish between the case
+  where ``var`` is ``None`` and the case where ``var`` has a value which is
+  converted to ``False`` by Python (like an empty string or list for example).
+  You need to be careful that this is really what you are trying to test.
+
+Generators
+----------
+
+Generators provide an easy way to create iterator objects (objects over which
+you can iterate). They can be created by using different methods.
+
+Generator expressions
+~~~~~~~~~~~~~~~~~~~~~
+
+*Generators expressions* are exactly similar to *list comprehensions* except
+that the brackets are replaced by parenthesis. Thus, the following code:
+
+.. code-block:: python
+
+  l = [1, 2, 3]
+  m = (i*i for i in l)
+  print '\n'.join(m)
+
+would produce the exact same result had the second line been replaced by:
+
+.. code-block:: python
+
+  m = [i*i for i in l]
+
+The difference between the two codes is that in the case where ``m`` is
+defined by a *list comprehension* the list is integrally computed (and placed
+in memory) when the variable ``m`` is defined. On the contrary, when ``m`` is
+defined by a *gemerator expression*, the elements in ``m`` are generated on
+the go *when needed*: only when trying to iterate over the variable ``m`` (as
+induced by the call to the ``join`` function in the above example) are the
+elements generated.
+
+From the speed of execution point of view, both solutions are equivalent: in
+the end, each element in ``m`` will be computed once and only once. From the
+memory usage point of view however, generators give a clear advantage:
+because the elements are generated dynamically (when needed), one at a time,
+never more than one elememt is stored in memory at the same time. In cases
+when the list is too big to fit into memory, then *generators* could be the
+solution.
+
+When using a ``generator expression`` as the argument of a function, Python
+allows to drop one pair of parenthesis to make the code more readable. For
+example, in the following code:
+
+.. code-block:: python
+
+  l = [1, 2, 3]
+  total = sum((i*i for i in l))
+
+the second line could be replaced by:
+
+.. code-block:: python
+
+  total = sum(i*i for i in l)
+
+Generator functions
+~~~~~~~~~~~~~~~~~~~
+
+A second way to define a *generator* is by writing a function using the special
+keyword ``yield``. When called, this function will return an iterable object
+whose behavior is the following: for each iteration step over the object, the
+function which defined it is executed until a ``yield`` instruction is hit. The
+value following the ``yield`` keyword is returned and can be used during the
+iteration step. The execution of the function is frozen until the next
+iteration step.
+
+For example, let us define the following function:
+
+.. code-block:: python
+
+    def min_max(filename):
+        with open(filename) as f:
+            for line in f:
+                l = map(int, line.split())
+                yield min(l), max(l)
+
+When called, this function will produce an iterable object. When iterating
+over this object, at each iteration one line of ``filename`` will be read,
+and the minimum and maximum value of this line will be returned when the
+``yield`` keyword is reached, freezing the execution of the function until
+the next iteration.
+
+Hence, the following code:
+
+.. code-block:: python
+
+    for (inf, sup) in min_max(filename):
+        print (inf + sup)/2.
+
+is exactly equivalent to:
+
+.. code-block:: python
+
+    with open(filename) as f:
+        for line in f:
+            l = map(int, line.split())
+            inf, sup = min(l), max(l)
+            print (inf + sup)/2.
+
+but allows you to define separately the code which computes the minimum and
+maximum value of the lines, and the code which computes their arithmetic
+mean.
+
+Built-in functions
+~~~~~~~~~~~~~~~~~~
+
+Finally, some built-in functions in Python return generator objects. This is
+the case of the ``xrange`` function which can be used exactly as the ``range``
+function. The difference is that ``range`` computes a list of integers whereas
+``xrange`` defines a generator object, which will generate the elements on the
+go, one at a time. For example a call to ``range(1000000000)`` might induce
+a memory error on your machine (if you do not have enough memory to store this
+list), whereas the same call using the ``xrange`` will not have this issue and
+will behave similarly for purposes of iteration. It is almost always more
+suitable to use ``xrange`` instead of ``range``: in Python 3.x for example,
+``range`` now behaves like ``xrange``.
+
+See more details on the `official documentation <http://docs.python.org/tutorial/classes.html#generators>`__.
+
+Decorators
+----------
+
+*Decorators* provide a very powerful way to alter the behavior of a function
+without redifining it. The syntax is the following:
+
+.. code-block:: python
+
+    @logging
+    def f(x):
+        return x + 1
+
+In the above example, we say that the ``f`` function has been *decorated* with
+the ``logging`` function. ``logging`` must be a function taking another
+function as an argument and the result of decorating the ``f`` function with
+``logging`` is equivalent to:
+
+.. code-block:: python
+
+    def f(x):
+        return x + 1
+
+    f = logging(f)
+
+which means that by decorating ``f``  with ``logging``, ``f`` now behaves as
+the composite function ``logging(f)``.
+
+A simple decorator
+~~~~~~~~~~~~~~~~~~
+
+Imagine that we want the ``logging`` decorator to *log* the calls to the
+function it decorates, by printing them to the standard output. Such
+a decorator could be written like this
+
+.. code-block:: python
+
+    def logging(fun):
+        def aux(*args, **kwargs):
+            print "Calling", fun.__name__
+            fun(*args, **kwargs)
+        return aux
+
+Because ``logging`` could be used to decorate any function, with an arbitrary
+number of arguments and keyword arguments, it is necessary to use the generic
+syntax ``aux(*args, **kwargs)`` which stores all the arguments passed to ``aux`` in
+a list named ``args`` and all the keyword arguments in a dictionary named
+``kwargs``. Note that the exact same arguments are passed to ``fun``, meaning
+that from the argument passing perspective, ``aux`` and ``fun`` behaves exactly
+the same. The only difference between ``aux`` and ``fun`` is that ``aux`` logs
+the call to the standard output before doing the computation made in ``fun``.
+This is the expected behavior of the ``logging`` decorator.
+
+To be perfectly rigorous, the previous decorator should have been written like
+this:
+
+.. code-block:: python
+
+    from functools import wraps
+
+    def logging(fun):
+        @wraps(fun)
+        def aux(*args, **kwargs):
+            print "Calling", fun.__name__
+            fun(*args, **kwargs)
+        return aux
+
+Note that ``aux`` is itself decorated by the ``wraps`` decorator provided by
+the ``functools`` official module. This decorators does some magic to ensure
+that ``aux`` behaves as closely as possible to ``fun``. For example, whithout
+this decorator, the following code:
+
+.. code-block:: python
+
+    @logging
+    def f(x):
+        return x + 1
+
+    print f.__name__
+
+would print ``aux`` to the standard output, instead of the expected ``f``. The
+``wraps`` decorator ensures that the ``__name__`` attribute is preserved
+throughout a decoration.
+
+Let us further assume that you want to extend the ``logging`` decorator to not
+only log the calls, but also keep track of how many times the function has been
+called. 
+
+You could be tempted to write something like:
+
+.. code-block:: python
+
+    from functools import wraps
+
+    def logging(fun):
+        a = 0
+        @wraps(fun)
+        def aux(*args, **kwargs):
+            a = a + 1
+            print "{0} has been called {1} times".format(fun.__name__, a)
+            fun(*args, **kwargs)
+        return aux
+
+However, if you apply this decorator to some function and then call this
+function, you will get an angry face from Python complaining that the variable
+``a`` is unbound. The reason for this is that in the line:
+
+.. code-block:: python
+
+    a = a + 1
+
+Python thinks you are redefining the variable ``a`` and forgets about the fact
+that this variable has already been initialized to 0. So when reaching the ``a
++ 1`` part, ``a`` is no more defined, which causes the error. This is a current
+limitation of Python 2: local variables that have been defined outside the
+current scope are read-only variables.
+
+A standard way to circumvent this limitation is to use a mutable structure for
+``a``: ``a`` itself cannot be redefined, but the structure it is pointed to can
+be modified. In the previous example, this could lead to the following code:
+
+.. code-block:: python
+
+    from functools import wraps
+
+    def logging(fun):
+        a = [0]
+        @wraps(fun)
+        def aux(*args, **kwargs):
+            a[0] = a[0] + 1
+            print "{0} has been called {1} times".format(fun.__name__, a[0])
+            fun(*args, **kwargs)
+        return aux
+
+where ``a`` points to a list a length 1 where the number of calls is stored at
+the first position.
+
+Another example
+~~~~~~~~~~~~~~~
+
+A common example which is often used to illustrate decorators in Python is
+`memoization <http://en.wikipedia.org/wiki/Memoization>__`: when a function is
+computation-heavy, but is often called using the same parameters, you can save
+a lot of time by storing past results returned by the function.
+
+This idea can be nicely implemented in Python by using a decorator. This
+decorator will store past results in a dictionary: when the decorated function
+will be called, the decorator will make a lookup in the dictionary to check
+whether the function has already been called with the same parameter, and
+return the stored value in this case.
+
+Here is how you could write such a decorator for a single argument function:
+
+.. code-block:: python
+
+    from functools import wraps
+
+    def memoize(fun):
+        cache = {}
+        @wraps(fun)
+        def aux(x):
+            if x in cache:
+                return cache[x]
+            else:
+                a = fun(x)
+                cache[x] = a
+                return a
+        return aux
+
+Then if ``f`` is defined like this:
+
+.. code-block:: python
+
+    @memoize
+    def f(x):
+        ... # very long and heavy computation
+
+when calling ``f`` twice with the same parameter, the first call will take
+a lot of time to be computed, whereas the secod call will be almost
+instantaneous.  
+
+Classes with two methods
+------------------------
+
+Let us briefly recall how class work in Python. A class is defined like this:
+
+.. code-block:: python
+
+    class Cipher:
+
+        def __init__(self, key):
+            self.key = key
+
+        def decrypt(self, message):
+            return (message & self.key)
+
+all the methods of a class take as their first argument, which is always named
+``self`` by convention, the instance of the class on which the method is
+called. Thus, if ``a`` is an instance of the ``Cipher`` class, the call
+``a.decrypt(message)`` is equivalent to ``decrypt(a, message)``.
+
+The special function ``__init__`` is the class constructor and is called
+every time an instance of the class is created. It is mainly used to initialize
+some attributes of the instance. An instance of ``Cipher`` class can be created
+like this:
+
+.. code-block:: python
+
+    d = Cipher(key)
+
+A flaw commonly found in code written by people coming from object-oriented
+programing languages is to create classes for everything. This often leads to
+classes containing only two methods, one being the ``__init__`` function. This
+is the case in the class written above as an example. By looking to this
+example a bit closer, you can see that you could completely get rid of the class
+definition: you only need a ``decrypt`` function taking the key as an
+additional argument:
+
+.. code-block:: python
+
+    def decrypt(key, message):
+        return (message & key)
+
+Some people could object that it still makes sense to use a class in the
+example above, if we plan to extend the ``Cypher`` class in the future, for
+example by adding an ``encryp`` function. In my opinion, it is better to start
+by writing your code as simply as possible. If you really need to extend the
+code, then you can start restructuring it and group several related functions
+in a class.
+
+PEP 8
+-----
+
+When writing about good practices in Python, it is impossible not to mention
+the PEP8. It is a set of recommendations regarding coding style in Python.
+These recommendations are of course not absolute rules and should be taken as
+advices. However, I noticed that following these recommendations generally
+leads to greater code readability. Furthermore, as many people who code in
+Python also follow these recommendations, adopting them reduces the gap between
+your code and code written by others: this will save you some time when trying
+to understand code.
+
+Here are a few points extracted from the PEP8:
+
+* you should follow English typographic rules: no space before a colon, no
+  space before a comma, but a space after, etc.
+
+* you should put spaces around operators like the equal sign, plus sign, etc.
+
+* you should try to limit the length of the lines of code to a maximum of 80
+  characters.
+
+More details on the `PEP8 page <http://www.python.org/dev/peps/pep-0008/>`__.
+
+
+    
+
+
author	Thibaut Horel <thibaut.horel@gmail.com>	2013-01-21 12:21:38 +0100
committer	Thibaut Horel <thibaut.horel@gmail.com>	2013-01-21 12:21:38 +0100
commit	473c850e417d00e66872d923dc8718c533ff1bf8 (patch)
tree	034132f1ed8cb662773f822755bef45e1f61a410 /content/python-best-practices.rst
parent	5c4e85fe93ec3a713cf173db34e422072254d1c7 (diff)
download	blog-473c850e417d00e66872d923dc8718c533ff1bf8.tar.gz