summaryrefslogtreecommitdiffstats
path: root/content/python-best-practices.rst
blob: fd8b7c36f665840ecb1ec1a1e58e5d3a6a188fd5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
Good practices in Python
========================

:date: 2013-01-20 17:34

This post is a collection of various facts about Python:

* common mistakes that I encounter frequently when reading code written by
  myself or other people.

* specific aspects of the Python language that are not very well-known and that
  I think should be used more.

* general recommendations regarding Python.

Please note that I do not consider myself a Python expert, and it is possible
that the following text contains inaccurate statements.

Also, due to its very nature, this post is rather unstructured. The table of
contents should help you jumping directly to the part you are interested in.

.. contents:: :local:

List comprehensions
-------------------

*List comprehensions* give Python users a very concise and powerful syntax to
build a list from another list (or any iterable object). The syntax is the
following:

.. code-block:: python

    result_list = [expression(item) for item in original_list if condition(item)]

which means that ``result_list`` will be a list containing ``expression(item)``
(any expression that you can computed from ``item``) where ``item`` is an
element of ``original_list`` for which ``condition(item)`` (a boolean
expression involving ``item``) is ``True``. The boolean condition which allows
you to filer the list is optional.

For example, to compute the list of squares of elements in a list, instead of:

.. code-block:: python

    l = [1, 2, 3]
    result = []
    for i in l:
        result.append(i*i)

which is particularly inefficient because of the repeated use of the ``append``
function, one could use a *list comprehension*:

.. code-block:: python

    l = [1, 2, 3]
    result = [i*i for i in l] # result = [1, 4, 9]

In addition to being shorter, the above code is also faster (around 3x
improvement) because you build the list in one instruction.

Another example to compute the list of square roots of all non-negative
elements of a list (you could get in big troubles computing the square root of
a negative element):

.. code-block:: python

    from math import sqrt

    l = [4, -3, 9]
    result = [sqrt(i) for i in l if i>=0] # result = [2, 3]

The same syntax also exists for dictionaries, this is called *dict
comprehensions* (very original, isn't it?). For example, to transform a list of
(name, phone number) pairs into a dictionary, for faster lookup:

.. code-block:: python

    l = [("Barthes", "+33 6 29 64 91 12"), ("Dumbo", "+001 650 472 4243")]
    d = {name: phone for (name, phone) in l}

You can get more details about *list comprehensions* on the `dedicated section <http://docs.python.org/tutorial/datastructures.html#list-comprehensions>`__ of the official documentation.

The multiples faces of ``in``
-----------------------------

The ``in`` keyword has many different meanings and makes Python code so easy to
write that people often forget to use it.

* ``in`` gives a universal syntax to iterate over iterable objects. For
  example, to iterate over a list, instead of:

  .. code-block:: python

    l = [1, 2, 3]
    for i in range(len(l)):
        print l[i]

  you could simply write:

  .. code-block:: python

        l = [1, 2,3]
        for i in l:
            print i

  similarly, to iterate over a dictionary, instead of:

  .. code-block:: python

        d = { ... }
        for key in d.keys():
            print d[key]

  you could write:

  .. code-block:: python

        d = { ... }
        for key in d:
            print d[key]

* ``in`` also allows you to test whether an element belongs to some structure:
  list, dictionary (or any iterable object), occurrence of a substring inside
  a string. For example:

  .. code-block:: python

        l = [line for line in open("server.log") if "Connected" in line]

  will return the list of lines from the file ``server.log`` containing
  ``Connected`` as a substring.

Manipulating lists with atomic instructions
-------------------------------------------

More generally, it is advised to avoid iterating over a list with a ``for``
loop. ``for`` loops are slow in Python and writing an operation over a list as
a single instruction allows Python to optimize the execution of the code
internally.

*List comprehensions* often help in replacing an iteration by a single
instructions. Here are a few other functions which can be helpful in this
regard:

* ``join`` can be useful to format a list. For example, to print the list of
  words whose first letter is ``a`` in a list of words. Instead of:

  .. code-block:: python

    l = [ ... ]
    result = ""
    for word in l:
        if word[0] == 'a':
            result += word + " "
    print result

  you could do:

  .. code-block:: python

    l = [ ... ]
    print " ".join([word for word in l if word[0] == 'a'])

* ``sum``. To sum the elements of a list.

* ``map`` to apply a given function to all elements in a list. For example to
  reverse all the words in a list:

  .. code-block:: python

    l = ["Dumbo", "Polochon"]

    def reverse(word):
        return word[::-1]

    m = map(reverse, l) # m = ['obmuD', 'nohcoloP']

*slices* are also very useful when it comes to manipulating lists (or sublists)
in blocks. Remember that if ``l`` is a list (or any iterable)
``l[begin:end:step]`` will extract all the elements from the index ``begin``
(included) to the index ``end`` (excluded) with a step of ``step`` (this last
parameter being optional).

If the ``begin`` parameter is omitted, it is given 0 as a default value.
Similarly, the default value of ``end`` when unspecified is ``len(l)`` (the
numbers of elements in ``l``). A negative value for ``begin`` or ``end`` will
be subtracted from the end of the list. For example, to extract all the element
from a list but the last one:

.. code-block:: python

    l = [1, 2, 3]
    m = l[:-1] # m = [1, 2]

Using a negative value for the ``step`` parameter can be useful to walk through
an iterable object in reverse order as shown in the example given above to take
the mirror image of a word:

.. code-block:: python
    
    word = "dumbo"
    drow = word[::-1] # drow = "obmud"

which compensates for the scandalous lack of a ``reverse`` function for strings
in Python.

Exceptions
----------

Exceptions provide a powerful tool found in many high-level programming
languages, and which are often under-used. They allow for a less defensive
programming style by handling errors *as they appears* instead of making test
*beforehand* to prevent them from happening.

In Python, every time you are trying to execute an illegal operation (*e. g.*
trying to access an element outside a list's boundaries, dividing by zero,
etc.) instead of simply crashing the program, Python raises an exception which
can be caught, which gives the programmer a last chance of fixing the problem
before the program ultimately crashes.

The syntax to catch exceptions in Python is the following:

.. code-block:: python

    try:
        .... # piece of code potentially raising the exception named Kaboum
    except:
        .... # piece of code to be executed is the above code raises the Kaboum exception

For example, if a line of code contains a division by a number which could
(rarely) be equal to zero, instead of systematically checking that the number
is non zero, it is much more efficient to encapsulate the line within a ``try
... except ZeroDivisionErro:`` to handle specifically the rare cases where the
number will be zero. This is the well known principle: *better ask for
absolution than permission*.

Another example, when trying to access a key which does not exist in
a dictionary, Python raises the ``KeyError`` exception. This exception can be
used to initialize the value associated with a key which does not exist yet in
the dictionary. For example, to compute a dictionary of word counts in a text,
you can often find:

.. code-block:: python

    text = "..."
    result = {}
    for word in text.split():
        if word in result:
            result[word] += 1
        else:
            result[word] = 1

You could instead use the ``KeyError`` exception to your advantage to avoid the
systematic ``if`` test:

.. code-block:: python

    test = "..."
    result = {}
    for word in text.split():
        try:
            result[word] += 1
        except KeyError:
            result[word] = 1

The difference with the previous code is that *most of the time*, this code
will behave exactly as if the body of the ``for`` loop only contained the
instruction ``result[word] += 1`` which is a significant speedup compared to
the first code where a test was computed for each iteration of the loop.

See the `dedicated page <http://docs.python.org/tutorial/errors.html#handling-exceptions>`__ in the official documentation.

Values equivalent to ``True`` or ``False``
------------------------------------------

If ``test`` is a boolean variable (equal to ``True`` or ``False``), we known
that it is redundant to write:

.. code-block:: python

    if test == True:
        ...

instead of:

.. code-block:: python

    if test:
        ...

More generally, Python has automatic conversion rules from standard types to
booleans to allow a shorter syntax in conditional tests:

* as in the vast majority of programming languages, a positive integer is
  converted to ``True`` and zero is converted to ``False``
* an string is converted to ``False`` if and only if it is empty. For example,
  to test whether or not a string ``title`` is empty, you can simply write:

  .. code-block:: python

    if title:
        ...

  instead of:

  .. code-block:: python

   if len(title) > 0:
        ...

* the ``None`` value, which is a constant used when a variable has not been
  specified is converted to ``False``. To test that a variable ``var`` is not
  equal to ``None``, you could write:

  .. code-block:: python

    if not var:
        ...

  **Beware**, the above code will not allow you to distinguish between the case
  where ``var`` is ``None`` and the case where ``var`` has a value which is
  converted to ``False`` by Python (like an empty string or list for example).
  You need to be careful that this is really what you are trying to test.

Generators
----------

Generators provide an easy way to create iterator objects (objects over which
you can iterate). They can be created by using different methods.

Generator expressions
~~~~~~~~~~~~~~~~~~~~~

*Generators expressions* are exactly similar to *list comprehensions* except
that the brackets are replaced by parenthesis. Thus, the following code:

.. code-block:: python

  l = [1, 2, 3]
  m = (i*i for i in l)
  print '\n'.join(m)

would produce the exact same result had the second line been replaced by:

.. code-block:: python

  m = [i*i for i in l]

The difference between the two codes is that in the case where ``m`` is
defined by a *list comprehension* the list is integrally computed (and placed
in memory) when the variable ``m`` is defined. On the contrary, when ``m`` is
defined by a *gemerator expression*, the elements in ``m`` are generated on
the go *when needed*: only when trying to iterate over the variable ``m`` (as
induced by the call to the ``join`` function in the above example) are the
elements generated.

From the speed of execution point of view, both solutions are equivalent: in
the end, each element in ``m`` will be computed once and only once. From the
memory usage point of view however, generators give a clear advantage:
because the elements are generated dynamically (when needed), one at a time,
never more than one elememt is stored in memory at the same time. In cases
when the list is too big to fit into memory, then *generators* could be the
solution.

When using a ``generator expression`` as the argument of a function, Python
allows to drop one pair of parenthesis to make the code more readable. For
example, in the following code:

.. code-block:: python

  l = [1, 2, 3]
  total = sum((i*i for i in l))

the second line could be replaced by:

.. code-block:: python

  total = sum(i*i for i in l)

Generator functions
~~~~~~~~~~~~~~~~~~~

A second way to define a *generator* is by writing a function using the special
keyword ``yield``. When called, this function will return an iterable object
whose behavior is the following: for each iteration step over the object, the
function which defined it is executed until a ``yield`` instruction is hit. The
value following the ``yield`` keyword is returned and can be used during the
iteration step. The execution of the function is frozen until the next
iteration step.

For example, let us define the following function:

.. code-block:: python

    def min_max(filename):
        with open(filename) as f:
            for line in f:
                l = map(int, line.split())
                yield min(l), max(l)

When called, this function will produce an iterable object. When iterating
over this object, at each iteration one line of ``filename`` will be read,
and the minimum and maximum value of this line will be returned when the
``yield`` keyword is reached, freezing the execution of the function until
the next iteration.

Hence, the following code:

.. code-block:: python

    for (inf, sup) in min_max(filename):
        print (inf + sup)/2.

is exactly equivalent to:

.. code-block:: python

    with open(filename) as f:
        for line in f:
            l = map(int, line.split())
            inf, sup = min(l), max(l)
            print (inf + sup)/2.

but allows you to define separately the code which computes the minimum and
maximum value of the lines, and the code which computes their arithmetic
mean.

Built-in functions
~~~~~~~~~~~~~~~~~~

Finally, some built-in functions in Python return generator objects. This is
the case of the ``xrange`` function which can be used exactly as the ``range``
function. The difference is that ``range`` computes a list of integers whereas
``xrange`` defines a generator object, which will generate the elements on the
go, one at a time. For example a call to ``range(1000000000)`` might induce
a memory error on your machine (if you do not have enough memory to store this
list), whereas the same call using the ``xrange`` will not have this issue and
will behave similarly for purposes of iteration. It is almost always more
suitable to use ``xrange`` instead of ``range``: in Python 3.x for example,
``range`` now behaves like ``xrange``.

See more details on the `official documentation <http://docs.python.org/tutorial/classes.html#generators>`__.

Decorators
----------

*Decorators* provide a very powerful way to alter the behavior of a function
without redifining it. The syntax is the following:

.. code-block:: python

    @logging
    def f(x):
        return x + 1

In the above example, we say that the ``f`` function has been *decorated* with
the ``logging`` function. ``logging`` must be a function taking another
function as an argument and the result of decorating the ``f`` function with
``logging`` is equivalent to:

.. code-block:: python

    def f(x):
        return x + 1

    f = logging(f)

which means that by decorating ``f``  with ``logging``, ``f`` now behaves as
the composite function ``logging(f)``.

A simple decorator
~~~~~~~~~~~~~~~~~~

Imagine that we want the ``logging`` decorator to *log* the calls to the
function it decorates, by printing them to the standard output. Such
a decorator could be written like this

.. code-block:: python

    def logging(fun):
        def aux(*args, **kwargs):
            print "Calling", fun.__name__
            fun(*args, **kwargs)
        return aux

Because ``logging`` could be used to decorate any function, with an arbitrary
number of arguments and keyword arguments, it is necessary to use the generic
syntax ``aux(*args, **kwargs)`` which stores all the arguments passed to ``aux`` in
a list named ``args`` and all the keyword arguments in a dictionary named
``kwargs``. Note that the exact same arguments are passed to ``fun``, meaning
that from the argument passing perspective, ``aux`` and ``fun`` behaves exactly
the same. The only difference between ``aux`` and ``fun`` is that ``aux`` logs
the call to the standard output before doing the computation made in ``fun``.
This is the expected behavior of the ``logging`` decorator.

To be perfectly rigorous, the previous decorator should have been written like
this:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        @wraps(fun)
        def aux(*args, **kwargs):
            print "Calling", fun.__name__
            fun(*args, **kwargs)
        return aux

Note that ``aux`` is itself decorated by the ``wraps`` decorator provided by
the ``functools`` official module. This decorators does some magic to ensure
that ``aux`` behaves as closely as possible to ``fun``. For example, whithout
this decorator, the following code:

.. code-block:: python

    @logging
    def f(x):
        return x + 1

    print f.__name__

would print ``aux`` to the standard output, instead of the expected ``f``. The
``wraps`` decorator ensures that the ``__name__`` attribute is preserved
throughout a decoration.

Let us further assume that you want to extend the ``logging`` decorator to not
only log the calls, but also keep track of how many times the function has been
called. 

You could be tempted to write something like:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        a = 0
        @wraps(fun)
        def aux(*args, **kwargs):
            a = a + 1
            print "{0} has been called {1} times".format(fun.__name__, a)
            fun(*args, **kwargs)
        return aux

However, if you apply this decorator to some function and then call this
function, you will get an angry face from Python complaining that the variable
``a`` is unbound. The reason for this is that in the line:

.. code-block:: python

    a = a + 1

Python thinks you are redefining the variable ``a`` and forgets about the fact
that this variable has already been initialized to 0. So when reaching the ``a
+ 1`` part, ``a`` is no more defined, which causes the error. This is a current
limitation of Python 2: local variables that have been defined outside the
current scope are read-only variables.

A standard way to circumvent this limitation is to use a mutable structure for
``a``: ``a`` itself cannot be redefined, but the structure it is pointed to can
be modified. In the previous example, this could lead to the following code:

.. code-block:: python

    from functools import wraps

    def logging(fun):
        a = [0]
        @wraps(fun)
        def aux(*args, **kwargs):
            a[0] = a[0] + 1
            print "{0} has been called {1} times".format(fun.__name__, a[0])
            fun(*args, **kwargs)
        return aux

where ``a`` points to a list a length 1 where the number of calls is stored at
the first position.

Another example
~~~~~~~~~~~~~~~

A common example which is often used to illustrate decorators in Python is
`memoization <http://en.wikipedia.org/wiki/Memoization>__`: when a function is
computation-heavy, but is often called using the same parameters, you can save
a lot of time by storing past results returned by the function.

This idea can be nicely implemented in Python by using a decorator. This
decorator will store past results in a dictionary: when the decorated function
will be called, the decorator will make a lookup in the dictionary to check
whether the function has already been called with the same parameter, and
return the stored value in this case.

Here is how you could write such a decorator for a single argument function:

.. code-block:: python

    from functools import wraps

    def memoize(fun):
        cache = {}
        @wraps(fun)
        def aux(x):
            if x in cache:
                return cache[x]
            else:
                a = fun(x)
                cache[x] = a
                return a
        return aux

Then if ``f`` is defined like this:

.. code-block:: python

    @memoize
    def f(x):
        ... # very long and heavy computation

when calling ``f`` twice with the same parameter, the first call will take
a lot of time to be computed, whereas the secod call will be almost
instantaneous.  

Classes with two methods
------------------------

Let us briefly recall how class work in Python. A class is defined like this:

.. code-block:: python

    class Cipher:

        def __init__(self, key):
            self.key = key

        def decrypt(self, message):
            return (message & self.key)

all the methods of a class take as their first argument, which is always named
``self`` by convention, the instance of the class on which the method is
called. Thus, if ``a`` is an instance of the ``Cipher`` class, the call
``a.decrypt(message)`` is equivalent to ``decrypt(a, message)``.

The special function ``__init__`` is the class constructor and is called
every time an instance of the class is created. It is mainly used to initialize
some attributes of the instance. An instance of ``Cipher`` class can be created
like this:

.. code-block:: python

    d = Cipher(key)

A flaw commonly found in code written by people coming from object-oriented
programing languages is to create classes for everything. This often leads to
classes containing only two methods, one being the ``__init__`` function. This
is the case in the class written above as an example. By looking to this
example a bit closer, you can see that you could completely get rid of the class
definition: you only need a ``decrypt`` function taking the key as an
additional argument:

.. code-block:: python

    def decrypt(key, message):
        return (message & key)

Some people could object that it still makes sense to use a class in the
example above, if we plan to extend the ``Cypher`` class in the future, for
example by adding an ``encryp`` function. In my opinion, it is better to start
by writing your code as simply as possible. If you really need to extend the
code, then you can start restructuring it and group several related functions
in a class.

PEP 8
-----

When writing about good practices in Python, it is impossible not to mention
the PEP8. It is a set of recommendations regarding coding style in Python.
These recommendations are of course not absolute rules and should be taken as
advices. However, I noticed that following these recommendations generally
leads to greater code readability. Furthermore, as many people who code in
Python also follow these recommendations, adopting them reduces the gap between
your code and code written by others: this will save you some time when trying
to understand code.

Here are a few points extracted from the PEP8:

* you should follow English typographic rules: no space before a colon, no
  space before a comma, but a space after, etc.

* you should put spaces around operators like the equal sign, plus sign, etc.

* you should try to limit the length of the lines of code to a maximum of 80
  characters.

More details on the `PEP8 page <http://www.python.org/dev/peps/pep-0008/>`__.