diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2013-01-21 12:21:38 +0100 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2013-01-21 12:21:38 +0100 |
| commit | 473c850e417d00e66872d923dc8718c533ff1bf8 (patch) | |
| tree | 034132f1ed8cb662773f822755bef45e1f61a410 /content/python-best-practices.rst | |
| parent | 5c4e85fe93ec3a713cf173db34e422072254d1c7 (diff) | |
| download | blog-473c850e417d00e66872d923dc8718c533ff1bf8.tar.gz | |
Add first article
Diffstat (limited to 'content/python-best-practices.rst')
| -rw-r--r-- | content/python-best-practices.rst | 700 |
1 files changed, 700 insertions, 0 deletions
diff --git a/content/python-best-practices.rst b/content/python-best-practices.rst new file mode 100644 index 0000000..fd8b7c3 --- /dev/null +++ b/content/python-best-practices.rst @@ -0,0 +1,700 @@ +Good practices in Python +======================== + +:date: 2013-01-20 17:34 + +This post is a collection of various facts about Python: + +* common mistakes that I encounter frequently when reading code written by + myself or other people. + +* specific aspects of the Python language that are not very well-known and that + I think should be used more. + +* general recommendations regarding Python. + +Please note that I do not consider myself a Python expert, and it is possible +that the following text contains inaccurate statements. + +Also, due to its very nature, this post is rather unstructured. The table of +contents should help you jumping directly to the part you are interested in. + +.. contents:: :local: + +List comprehensions +------------------- + +*List comprehensions* give Python users a very concise and powerful syntax to +build a list from another list (or any iterable object). The syntax is the +following: + +.. code-block:: python + + result_list = [expression(item) for item in original_list if condition(item)] + +which means that ``result_list`` will be a list containing ``expression(item)`` +(any expression that you can computed from ``item``) where ``item`` is an +element of ``original_list`` for which ``condition(item)`` (a boolean +expression involving ``item``) is ``True``. The boolean condition which allows +you to filer the list is optional. + +For example, to compute the list of squares of elements in a list, instead of: + +.. code-block:: python + + l = [1, 2, 3] + result = [] + for i in l: + result.append(i*i) + +which is particularly inefficient because of the repeated use of the ``append`` +function, one could use a *list comprehension*: + +.. code-block:: python + + l = [1, 2, 3] + result = [i*i for i in l] # result = [1, 4, 9] + +In addition to being shorter, the above code is also faster (around 3x +improvement) because you build the list in one instruction. + +Another example to compute the list of square roots of all non-negative +elements of a list (you could get in big troubles computing the square root of +a negative element): + +.. code-block:: python + + from math import sqrt + + l = [4, -3, 9] + result = [sqrt(i) for i in l if i>=0] # result = [2, 3] + +The same syntax also exists for dictionaries, this is called *dict +comprehensions* (very original, isn't it?). For example, to transform a list of +(name, phone number) pairs into a dictionary, for faster lookup: + +.. code-block:: python + + l = [("Barthes", "+33 6 29 64 91 12"), ("Dumbo", "+001 650 472 4243")] + d = {name: phone for (name, phone) in l} + +You can get more details about *list comprehensions* on the `dedicated section <http://docs.python.org/tutorial/datastructures.html#list-comprehensions>`__ of the official documentation. + +The multiples faces of ``in`` +----------------------------- + +The ``in`` keyword has many different meanings and makes Python code so easy to +write that people often forget to use it. + +* ``in`` gives a universal syntax to iterate over iterable objects. For + example, to iterate over a list, instead of: + + .. code-block:: python + + l = [1, 2, 3] + for i in range(len(l)): + print l[i] + + you could simply write: + + .. code-block:: python + + l = [1, 2,3] + for i in l: + print i + + similarly, to iterate over a dictionary, instead of: + + .. code-block:: python + + d = { ... } + for key in d.keys(): + print d[key] + + you could write: + + .. code-block:: python + + d = { ... } + for key in d: + print d[key] + +* ``in`` also allows you to test whether an element belongs to some structure: + list, dictionary (or any iterable object), occurrence of a substring inside + a string. For example: + + .. code-block:: python + + l = [line for line in open("server.log") if "Connected" in line] + + will return the list of lines from the file ``server.log`` containing + ``Connected`` as a substring. + +Manipulating lists with atomic instructions +------------------------------------------- + +More generally, it is advised to avoid iterating over a list with a ``for`` +loop. ``for`` loops are slow in Python and writing an operation over a list as +a single instruction allows Python to optimize the execution of the code +internally. + +*List comprehensions* often help in replacing an iteration by a single +instructions. Here are a few other functions which can be helpful in this +regard: + +* ``join`` can be useful to format a list. For example, to print the list of + words whose first letter is ``a`` in a list of words. Instead of: + + .. code-block:: python + + l = [ ... ] + result = "" + for word in l: + if word[0] == 'a': + result += word + " " + print result + + you could do: + + .. code-block:: python + + l = [ ... ] + print " ".join([word for word in l if word[0] == 'a']) + +* ``sum``. To sum the elements of a list. + +* ``map`` to apply a given function to all elements in a list. For example to + reverse all the words in a list: + + .. code-block:: python + + l = ["Dumbo", "Polochon"] + + def reverse(word): + return word[::-1] + + m = map(reverse, l) # m = ['obmuD', 'nohcoloP'] + +*slices* are also very useful when it comes to manipulating lists (or sublists) +in blocks. Remember that if ``l`` is a list (or any iterable) +``l[begin:end:step]`` will extract all the elements from the index ``begin`` +(included) to the index ``end`` (excluded) with a step of ``step`` (this last +parameter being optional). + +If the ``begin`` parameter is omitted, it is given 0 as a default value. +Similarly, the default value of ``end`` when unspecified is ``len(l)`` (the +numbers of elements in ``l``). A negative value for ``begin`` or ``end`` will +be subtracted from the end of the list. For example, to extract all the element +from a list but the last one: + +.. code-block:: python + + l = [1, 2, 3] + m = l[:-1] # m = [1, 2] + +Using a negative value for the ``step`` parameter can be useful to walk through +an iterable object in reverse order as shown in the example given above to take +the mirror image of a word: + +.. code-block:: python + + word = "dumbo" + drow = word[::-1] # drow = "obmud" + +which compensates for the scandalous lack of a ``reverse`` function for strings +in Python. + +Exceptions +---------- + +Exceptions provide a powerful tool found in many high-level programming +languages, and which are often under-used. They allow for a less defensive +programming style by handling errors *as they appears* instead of making test +*beforehand* to prevent them from happening. + +In Python, every time you are trying to execute an illegal operation (*e. g.* +trying to access an element outside a list's boundaries, dividing by zero, +etc.) instead of simply crashing the program, Python raises an exception which +can be caught, which gives the programmer a last chance of fixing the problem +before the program ultimately crashes. + +The syntax to catch exceptions in Python is the following: + +.. code-block:: python + + try: + .... # piece of code potentially raising the exception named Kaboum + except: + .... # piece of code to be executed is the above code raises the Kaboum exception + +For example, if a line of code contains a division by a number which could +(rarely) be equal to zero, instead of systematically checking that the number +is non zero, it is much more efficient to encapsulate the line within a ``try +... except ZeroDivisionErro:`` to handle specifically the rare cases where the +number will be zero. This is the well known principle: *better ask for +absolution than permission*. + +Another example, when trying to access a key which does not exist in +a dictionary, Python raises the ``KeyError`` exception. This exception can be +used to initialize the value associated with a key which does not exist yet in +the dictionary. For example, to compute a dictionary of word counts in a text, +you can often find: + +.. code-block:: python + + text = "..." + result = {} + for word in text.split(): + if word in result: + result[word] += 1 + else: + result[word] = 1 + +You could instead use the ``KeyError`` exception to your advantage to avoid the +systematic ``if`` test: + +.. code-block:: python + + test = "..." + result = {} + for word in text.split(): + try: + result[word] += 1 + except KeyError: + result[word] = 1 + +The difference with the previous code is that *most of the time*, this code +will behave exactly as if the body of the ``for`` loop only contained the +instruction ``result[word] += 1`` which is a significant speedup compared to +the first code where a test was computed for each iteration of the loop. + +See the `dedicated page <http://docs.python.org/tutorial/errors.html#handling-exceptions>`__ in the official documentation. + +Values equivalent to ``True`` or ``False`` +------------------------------------------ + +If ``test`` is a boolean variable (equal to ``True`` or ``False``), we known +that it is redundant to write: + +.. code-block:: python + + if test == True: + ... + +instead of: + +.. code-block:: python + + if test: + ... + +More generally, Python has automatic conversion rules from standard types to +booleans to allow a shorter syntax in conditional tests: + +* as in the vast majority of programming languages, a positive integer is + converted to ``True`` and zero is converted to ``False`` +* an string is converted to ``False`` if and only if it is empty. For example, + to test whether or not a string ``title`` is empty, you can simply write: + + .. code-block:: python + + if title: + ... + + instead of: + + .. code-block:: python + + if len(title) > 0: + ... + +* the ``None`` value, which is a constant used when a variable has not been + specified is converted to ``False``. To test that a variable ``var`` is not + equal to ``None``, you could write: + + .. code-block:: python + + if not var: + ... + + **Beware**, the above code will not allow you to distinguish between the case + where ``var`` is ``None`` and the case where ``var`` has a value which is + converted to ``False`` by Python (like an empty string or list for example). + You need to be careful that this is really what you are trying to test. + +Generators +---------- + +Generators provide an easy way to create iterator objects (objects over which +you can iterate). They can be created by using different methods. + +Generator expressions +~~~~~~~~~~~~~~~~~~~~~ + +*Generators expressions* are exactly similar to *list comprehensions* except +that the brackets are replaced by parenthesis. Thus, the following code: + +.. code-block:: python + + l = [1, 2, 3] + m = (i*i for i in l) + print '\n'.join(m) + +would produce the exact same result had the second line been replaced by: + +.. code-block:: python + + m = [i*i for i in l] + +The difference between the two codes is that in the case where ``m`` is +defined by a *list comprehension* the list is integrally computed (and placed +in memory) when the variable ``m`` is defined. On the contrary, when ``m`` is +defined by a *gemerator expression*, the elements in ``m`` are generated on +the go *when needed*: only when trying to iterate over the variable ``m`` (as +induced by the call to the ``join`` function in the above example) are the +elements generated. + +From the speed of execution point of view, both solutions are equivalent: in +the end, each element in ``m`` will be computed once and only once. From the +memory usage point of view however, generators give a clear advantage: +because the elements are generated dynamically (when needed), one at a time, +never more than one elememt is stored in memory at the same time. In cases +when the list is too big to fit into memory, then *generators* could be the +solution. + +When using a ``generator expression`` as the argument of a function, Python +allows to drop one pair of parenthesis to make the code more readable. For +example, in the following code: + +.. code-block:: python + + l = [1, 2, 3] + total = sum((i*i for i in l)) + +the second line could be replaced by: + +.. code-block:: python + + total = sum(i*i for i in l) + +Generator functions +~~~~~~~~~~~~~~~~~~~ + +A second way to define a *generator* is by writing a function using the special +keyword ``yield``. When called, this function will return an iterable object +whose behavior is the following: for each iteration step over the object, the +function which defined it is executed until a ``yield`` instruction is hit. The +value following the ``yield`` keyword is returned and can be used during the +iteration step. The execution of the function is frozen until the next +iteration step. + +For example, let us define the following function: + +.. code-block:: python + + def min_max(filename): + with open(filename) as f: + for line in f: + l = map(int, line.split()) + yield min(l), max(l) + +When called, this function will produce an iterable object. When iterating +over this object, at each iteration one line of ``filename`` will be read, +and the minimum and maximum value of this line will be returned when the +``yield`` keyword is reached, freezing the execution of the function until +the next iteration. + +Hence, the following code: + +.. code-block:: python + + for (inf, sup) in min_max(filename): + print (inf + sup)/2. + +is exactly equivalent to: + +.. code-block:: python + + with open(filename) as f: + for line in f: + l = map(int, line.split()) + inf, sup = min(l), max(l) + print (inf + sup)/2. + +but allows you to define separately the code which computes the minimum and +maximum value of the lines, and the code which computes their arithmetic +mean. + +Built-in functions +~~~~~~~~~~~~~~~~~~ + +Finally, some built-in functions in Python return generator objects. This is +the case of the ``xrange`` function which can be used exactly as the ``range`` +function. The difference is that ``range`` computes a list of integers whereas +``xrange`` defines a generator object, which will generate the elements on the +go, one at a time. For example a call to ``range(1000000000)`` might induce +a memory error on your machine (if you do not have enough memory to store this +list), whereas the same call using the ``xrange`` will not have this issue and +will behave similarly for purposes of iteration. It is almost always more +suitable to use ``xrange`` instead of ``range``: in Python 3.x for example, +``range`` now behaves like ``xrange``. + +See more details on the `official documentation <http://docs.python.org/tutorial/classes.html#generators>`__. + +Decorators +---------- + +*Decorators* provide a very powerful way to alter the behavior of a function +without redifining it. The syntax is the following: + +.. code-block:: python + + @logging + def f(x): + return x + 1 + +In the above example, we say that the ``f`` function has been *decorated* with +the ``logging`` function. ``logging`` must be a function taking another +function as an argument and the result of decorating the ``f`` function with +``logging`` is equivalent to: + +.. code-block:: python + + def f(x): + return x + 1 + + f = logging(f) + +which means that by decorating ``f`` with ``logging``, ``f`` now behaves as +the composite function ``logging(f)``. + +A simple decorator +~~~~~~~~~~~~~~~~~~ + +Imagine that we want the ``logging`` decorator to *log* the calls to the +function it decorates, by printing them to the standard output. Such +a decorator could be written like this + +.. code-block:: python + + def logging(fun): + def aux(*args, **kwargs): + print "Calling", fun.__name__ + fun(*args, **kwargs) + return aux + +Because ``logging`` could be used to decorate any function, with an arbitrary +number of arguments and keyword arguments, it is necessary to use the generic +syntax ``aux(*args, **kwargs)`` which stores all the arguments passed to ``aux`` in +a list named ``args`` and all the keyword arguments in a dictionary named +``kwargs``. Note that the exact same arguments are passed to ``fun``, meaning +that from the argument passing perspective, ``aux`` and ``fun`` behaves exactly +the same. The only difference between ``aux`` and ``fun`` is that ``aux`` logs +the call to the standard output before doing the computation made in ``fun``. +This is the expected behavior of the ``logging`` decorator. + +To be perfectly rigorous, the previous decorator should have been written like +this: + +.. code-block:: python + + from functools import wraps + + def logging(fun): + @wraps(fun) + def aux(*args, **kwargs): + print "Calling", fun.__name__ + fun(*args, **kwargs) + return aux + +Note that ``aux`` is itself decorated by the ``wraps`` decorator provided by +the ``functools`` official module. This decorators does some magic to ensure +that ``aux`` behaves as closely as possible to ``fun``. For example, whithout +this decorator, the following code: + +.. code-block:: python + + @logging + def f(x): + return x + 1 + + print f.__name__ + +would print ``aux`` to the standard output, instead of the expected ``f``. The +``wraps`` decorator ensures that the ``__name__`` attribute is preserved +throughout a decoration. + +Let us further assume that you want to extend the ``logging`` decorator to not +only log the calls, but also keep track of how many times the function has been +called. + +You could be tempted to write something like: + +.. code-block:: python + + from functools import wraps + + def logging(fun): + a = 0 + @wraps(fun) + def aux(*args, **kwargs): + a = a + 1 + print "{0} has been called {1} times".format(fun.__name__, a) + fun(*args, **kwargs) + return aux + +However, if you apply this decorator to some function and then call this +function, you will get an angry face from Python complaining that the variable +``a`` is unbound. The reason for this is that in the line: + +.. code-block:: python + + a = a + 1 + +Python thinks you are redefining the variable ``a`` and forgets about the fact +that this variable has already been initialized to 0. So when reaching the ``a ++ 1`` part, ``a`` is no more defined, which causes the error. This is a current +limitation of Python 2: local variables that have been defined outside the +current scope are read-only variables. + +A standard way to circumvent this limitation is to use a mutable structure for +``a``: ``a`` itself cannot be redefined, but the structure it is pointed to can +be modified. In the previous example, this could lead to the following code: + +.. code-block:: python + + from functools import wraps + + def logging(fun): + a = [0] + @wraps(fun) + def aux(*args, **kwargs): + a[0] = a[0] + 1 + print "{0} has been called {1} times".format(fun.__name__, a[0]) + fun(*args, **kwargs) + return aux + +where ``a`` points to a list a length 1 where the number of calls is stored at +the first position. + +Another example +~~~~~~~~~~~~~~~ + +A common example which is often used to illustrate decorators in Python is +`memoization <http://en.wikipedia.org/wiki/Memoization>__`: when a function is +computation-heavy, but is often called using the same parameters, you can save +a lot of time by storing past results returned by the function. + +This idea can be nicely implemented in Python by using a decorator. This +decorator will store past results in a dictionary: when the decorated function +will be called, the decorator will make a lookup in the dictionary to check +whether the function has already been called with the same parameter, and +return the stored value in this case. + +Here is how you could write such a decorator for a single argument function: + +.. code-block:: python + + from functools import wraps + + def memoize(fun): + cache = {} + @wraps(fun) + def aux(x): + if x in cache: + return cache[x] + else: + a = fun(x) + cache[x] = a + return a + return aux + +Then if ``f`` is defined like this: + +.. code-block:: python + + @memoize + def f(x): + ... # very long and heavy computation + +when calling ``f`` twice with the same parameter, the first call will take +a lot of time to be computed, whereas the secod call will be almost +instantaneous. + +Classes with two methods +------------------------ + +Let us briefly recall how class work in Python. A class is defined like this: + +.. code-block:: python + + class Cipher: + + def __init__(self, key): + self.key = key + + def decrypt(self, message): + return (message & self.key) + +all the methods of a class take as their first argument, which is always named +``self`` by convention, the instance of the class on which the method is +called. Thus, if ``a`` is an instance of the ``Cipher`` class, the call +``a.decrypt(message)`` is equivalent to ``decrypt(a, message)``. + +The special function ``__init__`` is the class constructor and is called +every time an instance of the class is created. It is mainly used to initialize +some attributes of the instance. An instance of ``Cipher`` class can be created +like this: + +.. code-block:: python + + d = Cipher(key) + +A flaw commonly found in code written by people coming from object-oriented +programing languages is to create classes for everything. This often leads to +classes containing only two methods, one being the ``__init__`` function. This +is the case in the class written above as an example. By looking to this +example a bit closer, you can see that you could completely get rid of the class +definition: you only need a ``decrypt`` function taking the key as an +additional argument: + +.. code-block:: python + + def decrypt(key, message): + return (message & key) + +Some people could object that it still makes sense to use a class in the +example above, if we plan to extend the ``Cypher`` class in the future, for +example by adding an ``encryp`` function. In my opinion, it is better to start +by writing your code as simply as possible. If you really need to extend the +code, then you can start restructuring it and group several related functions +in a class. + +PEP 8 +----- + +When writing about good practices in Python, it is impossible not to mention +the PEP8. It is a set of recommendations regarding coding style in Python. +These recommendations are of course not absolute rules and should be taken as +advices. However, I noticed that following these recommendations generally +leads to greater code readability. Furthermore, as many people who code in +Python also follow these recommendations, adopting them reduces the gap between +your code and code written by others: this will save you some time when trying +to understand code. + +Here are a few points extracted from the PEP8: + +* you should follow English typographic rules: no space before a colon, no + space before a comma, but a space after, etc. + +* you should put spaces around operators like the equal sign, plus sign, etc. + +* you should try to limit the length of the lines of code to a maximum of 80 + characters. + +More details on the `PEP8 page <http://www.python.org/dev/peps/pep-0008/>`__. + + + + + |
