From fbca4270d5d7a2ce59fafd36000058501e85c3d6 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 21 Jan 2013 17:26:47 +0100 Subject: Small corrections to the previous article --- content/python-best-practices.rst | 252 ++++++++++++++++++-------------------- 1 file changed, 122 insertions(+), 130 deletions(-) (limited to 'content/python-best-practices.rst') diff --git a/content/python-best-practices.rst b/content/python-best-practices.rst index fd8b7c3..68664db 100644 --- a/content/python-best-practices.rst +++ b/content/python-best-practices.rst @@ -8,13 +8,13 @@ This post is a collection of various facts about Python: * common mistakes that I encounter frequently when reading code written by myself or other people. -* specific aspects of the Python language that are not very well-known and that +* specific features of the Python language that are not very well-known and that I think should be used more. * general recommendations regarding Python. -Please note that I do not consider myself a Python expert, and it is possible -that the following text contains inaccurate statements. +Please note that I do not consider myself a Python expert, so it is possible +that the following text contains some inaccurate statements. Also, due to its very nature, this post is rather unstructured. The table of contents should help you jumping directly to the part you are interested in. @@ -33,10 +33,10 @@ following: result_list = [expression(item) for item in original_list if condition(item)] which means that ``result_list`` will be a list containing ``expression(item)`` -(any expression that you can computed from ``item``) where ``item`` is an -element of ``original_list`` for which ``condition(item)`` (a boolean -expression involving ``item``) is ``True``. The boolean condition which allows -you to filer the list is optional. +(an expression computed from ``item``) for each ``item`` element of +``original_list`` for which ``condition(item)`` (a boolean expression involving +``item``) is ``True``. The boolean condition which allows you to filer the list +is optional. For example, to compute the list of squares of elements in a list, instead of: @@ -67,7 +67,7 @@ a negative element): from math import sqrt l = [4, -3, 9] - result = [sqrt(i) for i in l if i>=0] # result = [2, 3] + result = [sqrt(i) for i in l if i >= 0] # result = [2, 3] The same syntax also exists for dictionaries, this is called *dict comprehensions* (very original, isn't it?). For example, to transform a list of @@ -139,7 +139,7 @@ a single instruction allows Python to optimize the execution of the code internally. *List comprehensions* often help in replacing an iteration by a single -instructions. Here are a few other functions which can be helpful in this +instruction. Here are a few other functions which can be helpful in this regard: * ``join`` can be useful to format a list. For example, to print the list of @@ -161,9 +161,9 @@ regard: l = [ ... ] print " ".join([word for word in l if word[0] == 'a']) -* ``sum``. To sum the elements of a list. +* ``sum``, to sum the elements of a list. -* ``map`` to apply a given function to all elements in a list. For example to +* ``map``, to apply a given function to all elements in a list. For example to reverse all the words in a list: .. code-block:: python @@ -177,15 +177,15 @@ regard: *slices* are also very useful when it comes to manipulating lists (or sublists) in blocks. Remember that if ``l`` is a list (or any iterable) -``l[begin:end:step]`` will extract all the elements from the index ``begin`` -(included) to the index ``end`` (excluded) with a step of ``step`` (this last +``l[begin:end:step]`` will extract all the elements from index ``begin`` +(included) to index ``end`` (excluded) with a step of ``step`` (this last parameter being optional). -If the ``begin`` parameter is omitted, it is given 0 as a default value. +If the ``begin`` parameter is omitted, it is given 0 as default value. Similarly, the default value of ``end`` when unspecified is ``len(l)`` (the numbers of elements in ``l``). A negative value for ``begin`` or ``end`` will be subtracted from the end of the list. For example, to extract all the element -from a list but the last one: +but the last one: .. code-block:: python @@ -208,15 +208,15 @@ Exceptions ---------- Exceptions provide a powerful tool found in many high-level programming -languages, and which are often under-used. They allow for a less defensive -programming style by handling errors *as they appears* instead of making test +languages which is often under-used. They allow for a less defensive +programming style by handling errors *as they appear* instead of making test *beforehand* to prevent them from happening. In Python, every time you are trying to execute an illegal operation (*e. g.* trying to access an element outside a list's boundaries, dividing by zero, etc.) instead of simply crashing the program, Python raises an exception which -can be caught, which gives the programmer a last chance of fixing the problem -before the program ultimately crashes. +can be caught, givng the programmer a last chance to fix the problem before the +program ultimately crashes. The syntax to catch exceptions in Python is the following: @@ -224,21 +224,20 @@ The syntax to catch exceptions in Python is the following: try: .... # piece of code potentially raising the exception named Kaboum - except: - .... # piece of code to be executed is the above code raises the Kaboum exception + except Kaboum: + .... # piece of code to be executed if the above code raises Kaboum For example, if a line of code contains a division by a number which could -(rarely) be equal to zero, instead of systematically checking that the number +seldom be equal to zero, instead of systematically checking that the number is non zero, it is much more efficient to encapsulate the line within a ``try -... except ZeroDivisionErro:`` to handle specifically the rare cases where the -number will be zero. This is the well known principle: *better ask for +... except ZeroDivisionErro:`` to handle specifically the rare cases when the +number will be zero. This is the well-known principle: *better ask for absolution than permission*. -Another example, when trying to access a key which does not exist in -a dictionary, Python raises the ``KeyError`` exception. This exception can be -used to initialize the value associated with a key which does not exist yet in -the dictionary. For example, to compute a dictionary of word counts in a text, -you can often find: +Another example, when trying to access an unbound key in a dictionary, Python +raises the ``KeyError`` exception. This exception can be used to initialize the +value associated with the unbound key. For example, to compute a dictionary of +word counts in a text, you can often find: .. code-block:: python @@ -265,7 +264,7 @@ systematic ``if`` test: The difference with the previous code is that *most of the time*, this code will behave exactly as if the body of the ``for`` loop only contained the -instruction ``result[word] += 1`` which is a significant speedup compared to +instruction ``result[word] += 1``. This gives a significant speedup compared to the first code where a test was computed for each iteration of the loop. See the `dedicated page `__ in the official documentation. @@ -273,7 +272,7 @@ See the `dedicated page 0: ... -* the ``None`` value, which is a constant used when a variable has not been - specified is converted to ``False``. To test that a variable ``var`` is not - equal to ``None``, you could write: +* the ``None`` value, a constant used to initialize unspecified variables, is + converted to ``False``. To test that a variable ``var`` is not equal to + ``None``, you can write: .. code-block:: python if not var: ... - **Beware**, the above code will not allow you to distinguish between the case - where ``var`` is ``None`` and the case where ``var`` has a value which is - converted to ``False`` by Python (like an empty string or list for example). + **Beware**, the above code will not allow you to distinguish the case where + ``var`` is ``None`` from the case where ``var`` has a value which is + converted to ``False`` by Python (for example, an empty string or list). You need to be careful that this is really what you are trying to test. Generators ---------- Generators provide an easy way to create iterator objects (objects over which -you can iterate). They can be created by using different methods. +you can iterate) and can be created in several ways. Generator expressions ~~~~~~~~~~~~~~~~~~~~~ @@ -347,20 +346,19 @@ would produce the exact same result had the second line been replaced by: m = [i*i for i in l] The difference between the two codes is that in the case where ``m`` is -defined by a *list comprehension* the list is integrally computed (and placed -in memory) when the variable ``m`` is defined. On the contrary, when ``m`` is -defined by a *gemerator expression*, the elements in ``m`` are generated on +defined by a *list comprehension* the list is integrally computed and stored +in memory when the variable ``m`` is defined. On the contrary, when ``m`` is +defined by a *generator expression*, the elements in ``m`` are generated on the go *when needed*: only when trying to iterate over the variable ``m`` (as induced by the call to the ``join`` function in the above example) are the elements generated. From the speed of execution point of view, both solutions are equivalent: in the end, each element in ``m`` will be computed once and only once. From the -memory usage point of view however, generators give a clear advantage: -because the elements are generated dynamically (when needed), one at a time, -never more than one elememt is stored in memory at the same time. In cases -when the list is too big to fit into memory, then *generators* could be the -solution. +memory usage point of view however, generators present a clear advantage: +because the elements are generated dynamically, one at a time, never more than +one element is stored in memory at the same time. In cases when the list is too +big to fit into memory, *generators* could be the solution. When using a ``generator expression`` as the argument of a function, Python allows to drop one pair of parenthesis to make the code more readable. For @@ -371,7 +369,7 @@ example, in the following code: l = [1, 2, 3] total = sum((i*i for i in l)) -the second line could be replaced by: +the second line can be replaced by: .. code-block:: python @@ -382,11 +380,10 @@ Generator functions A second way to define a *generator* is by writing a function using the special keyword ``yield``. When called, this function will return an iterable object -whose behavior is the following: for each iteration step over the object, the -function which defined it is executed until a ``yield`` instruction is hit. The -value following the ``yield`` keyword is returned and can be used during the -iteration step. The execution of the function is frozen until the next -iteration step. +whose behavior is the following: on each iteration step, the function is +executed until a ``yield`` instruction is hit. The value following the +``yield`` keyword is returned and can be used during the iteration step. The +execution of the function is frozen until the next iteration step. For example, let us define the following function: @@ -399,7 +396,7 @@ For example, let us define the following function: yield min(l), max(l) When called, this function will produce an iterable object. When iterating -over this object, at each iteration one line of ``filename`` will be read, +over this object, at each iteration, one line of ``filename`` will be read, and the minimum and maximum value of this line will be returned when the ``yield`` keyword is reached, freezing the execution of the function until the next iteration. @@ -421,25 +418,24 @@ is exactly equivalent to: inf, sup = min(l), max(l) print (inf + sup)/2. -but allows you to define separately the code which computes the minimum and -maximum value of the lines, and the code which computes their arithmetic -mean. +but allows you to define separately the code which will generate the list of +minimum and maximum values, and the code which makes use of the generated +elements. Built-in functions ~~~~~~~~~~~~~~~~~~ Finally, some built-in functions in Python return generator objects. This is -the case of the ``xrange`` function which can be used exactly as the ``range`` +the case of the ``xrange`` function which behaves exactly as the ``range`` function. The difference is that ``range`` computes a list of integers whereas -``xrange`` defines a generator object, which will generate the elements on the -go, one at a time. For example a call to ``range(1000000000)`` might induce -a memory error on your machine (if you do not have enough memory to store this -list), whereas the same call using the ``xrange`` will not have this issue and -will behave similarly for purposes of iteration. It is almost always more -suitable to use ``xrange`` instead of ``range``: in Python 3.x for example, -``range`` now behaves like ``xrange``. +``xrange`` defines a generator object generating the elements on the go, one at +a time. A call to ``range(1000000000)`` might induce a memory error on your +machine (depending on your memory capacity), but you will be fine using +``xrange``, both calls being equivalent for iteration purposes. It is almost +always more suitable to use ``xrange`` instead of ``range`` and in +Python 3.x ``xrange`` has even been renamed to ``range``. -See more details on the `official documentation `__. +Read more about generators on the `official documentation `__. Decorators ---------- @@ -453,10 +449,9 @@ without redifining it. The syntax is the following: def f(x): return x + 1 -In the above example, we say that the ``f`` function has been *decorated* with -the ``logging`` function. ``logging`` must be a function taking another -function as an argument and the result of decorating the ``f`` function with -``logging`` is equivalent to: +In the above example, we say that ``f`` has been *decorated* with ``logging``. +``logging`` must be a function taking another function as an argument. The +result of this decoration is equivalent to this piece of code: .. code-block:: python @@ -471,7 +466,7 @@ the composite function ``logging(f)``. A simple decorator ~~~~~~~~~~~~~~~~~~ -Imagine that we want the ``logging`` decorator to *log* the calls to the +Imagine that we want the ``logging`` decorator to *log* the calls made to the function it decorates, by printing them to the standard output. Such a decorator could be written like this @@ -485,13 +480,13 @@ a decorator could be written like this Because ``logging`` could be used to decorate any function, with an arbitrary number of arguments and keyword arguments, it is necessary to use the generic -syntax ``aux(*args, **kwargs)`` which stores all the arguments passed to ``aux`` in -a list named ``args`` and all the keyword arguments in a dictionary named -``kwargs``. Note that the exact same arguments are passed to ``fun``, meaning -that from the argument passing perspective, ``aux`` and ``fun`` behaves exactly -the same. The only difference between ``aux`` and ``fun`` is that ``aux`` logs -the call to the standard output before doing the computation made in ``fun``. -This is the expected behavior of the ``logging`` decorator. +syntax ``aux(*args, **kwargs)``. This syntax stores all the arguments passed to +``aux`` in a list named ``args`` and all the keyword arguments in a dictionary +named ``kwargs``. Note that the exact same arguments are passed to ``fun``, +meaning that from the argument passing perspective, ``aux`` and ``fun`` will +behave similarly. The difference being that ``aux`` logs the call to the +standard output prior to doing the computation made in ``fun``: this is how we +expected the decorator to behave. To be perfectly rigorous, the previous decorator should have been written like this: @@ -507,10 +502,10 @@ this: fun(*args, **kwargs) return aux -Note that ``aux`` is itself decorated by the ``wraps`` decorator provided by -the ``functools`` official module. This decorators does some magic to ensure -that ``aux`` behaves as closely as possible to ``fun``. For example, whithout -this decorator, the following code: +``aux`` is now itself decorated by the ``wraps`` decorator provided by the +``functools`` module. This decorators does some magic to ensure that ``aux`` +behaves as closely as possible to ``fun``. Without this decorator, the +following code: .. code-block:: python @@ -521,8 +516,8 @@ this decorator, the following code: print f.__name__ would print ``aux`` to the standard output, instead of the expected ``f``. The -``wraps`` decorator ensures that the ``__name__`` attribute is preserved -throughout a decoration. +``wraps`` decorator ensures among other things that the ``__name__`` attribute +is preserved throughout a decoration. Let us further assume that you want to extend the ``logging`` decorator to not only log the calls, but also keep track of how many times the function has been @@ -543,30 +538,29 @@ You could be tempted to write something like: fun(*args, **kwargs) return aux -However, if you apply this decorator to some function and then call this -function, you will get an angry face from Python complaining that the variable -``a`` is unbound. The reason for this is that in the line: +However, if you apply this decorator to some function and then call it, you +will get an angry face from Python complaining that the variable ``a`` is +unbound. The problem comes from this line: .. code-block:: python a = a + 1 -Python thinks you are redefining the variable ``a`` and forgets about the fact -that this variable has already been initialized to 0. So when reaching the ``a -+ 1`` part, ``a`` is no more defined, which causes the error. This is a current -limitation of Python 2: local variables that have been defined outside the -current scope are read-only variables. +Here, Python thinks you are redefining the variable ``a`` and forgets about its +previous definition. As a consequence, when reaching the ``a + 1`` part, ``a`` +is no longer defined, causing the error. This is a current limitation of Python +2: local variables that have been defined outside the current scope are read-only. A standard way to circumvent this limitation is to use a mutable structure for -``a``: ``a`` itself cannot be redefined, but the structure it is pointed to can -be modified. In the previous example, this could lead to the following code: +``a``: ``a`` itself cannot be redefined, but the structure it is pointed to +can. Using this, the previous example can be rewritten as: .. code-block:: python from functools import wraps def logging(fun): - a = [0] + a = [0] @wraps(fun) def aux(*args, **kwargs): a[0] = a[0] + 1 @@ -574,22 +568,22 @@ be modified. In the previous example, this could lead to the following code: fun(*args, **kwargs) return aux -where ``a`` points to a list a length 1 where the number of calls is stored at -the first position. +where ``a`` points to a list of length 1 storing the number of calls at its +first position. Another example ~~~~~~~~~~~~~~~ A common example which is often used to illustrate decorators in Python is -`memoization __`: when a function is -computation-heavy, but is often called using the same parameters, you can save -a lot of time by storing past results returned by the function. +`memoization `__: when a function is +computation-heavy but often called using the same arguments, you can save +a lot of time by caching past results returned by the function. -This idea can be nicely implemented in Python by using a decorator. This -decorator will store past results in a dictionary: when the decorated function -will be called, the decorator will make a lookup in the dictionary to check -whether the function has already been called with the same parameter, and -return the stored value in this case. +This idea can be nicely implemented in Python using a decorator. The decorator +will store past results in a dictionary: when the decorated function will be +called, the decorator will perform a lookup in its dictionary to check whether +the function has already been called with the same argument. If the dictionary +already contains an entry for this argument, the associated value is returned. Here is how you could write such a decorator for a single argument function: @@ -617,14 +611,13 @@ Then if ``f`` is defined like this: def f(x): ... # very long and heavy computation -when calling ``f`` twice with the same parameter, the first call will take -a lot of time to be computed, whereas the secod call will be almost -instantaneous. +when calling ``f`` twice with the same argument, you will incur the computation +cost only during the first call, the second call being almost instantaneous. Classes with two methods ------------------------ -Let us briefly recall how class work in Python. A class is defined like this: +Let us briefly recall how classes work in Python. A class is defined like this: .. code-block:: python @@ -636,15 +629,15 @@ Let us briefly recall how class work in Python. A class is defined like this: def decrypt(self, message): return (message & self.key) -all the methods of a class take as their first argument, which is always named -``self`` by convention, the instance of the class on which the method is -called. Thus, if ``a`` is an instance of the ``Cipher`` class, the call +all the methods of a class take as their first argument the instance on which +the method is being called. By convention, this first arguments is always named +``self``. If ``a`` is an instance of ``Cipher``, the instruction ``a.decrypt(message)`` is equivalent to ``decrypt(a, message)``. -The special function ``__init__`` is the class constructor and is called -every time an instance of the class is created. It is mainly used to initialize -some attributes of the instance. An instance of ``Cipher`` class can be created -like this: +The special function ``__init__`` is the class constructor and is called every +time an instance of the class is created. Its typical use is to initialize some +attributes of the instance. An instance of ``Cipher`` class can be created like +this: .. code-block:: python @@ -654,9 +647,9 @@ A flaw commonly found in code written by people coming from object-oriented programing languages is to create classes for everything. This often leads to classes containing only two methods, one being the ``__init__`` function. This is the case in the class written above as an example. By looking to this -example a bit closer, you can see that you could completely get rid of the class -definition: you only need a ``decrypt`` function taking the key as an -additional argument: +example a bit closer, you can see that it is possible to completely get rid of +the class definition: a ``decrypt`` function taking the key as an additional +argument is sufficient: .. code-block:: python @@ -665,7 +658,7 @@ additional argument: Some people could object that it still makes sense to use a class in the example above, if we plan to extend the ``Cypher`` class in the future, for -example by adding an ``encryp`` function. In my opinion, it is better to start +example by adding an ``encrypt`` function. In my opinion, it is better to start by writing your code as simply as possible. If you really need to extend the code, then you can start restructuring it and group several related functions in a class. @@ -674,13 +667,12 @@ PEP 8 ----- When writing about good practices in Python, it is impossible not to mention -the PEP8. It is a set of recommendations regarding coding style in Python. -These recommendations are of course not absolute rules and should be taken as -advices. However, I noticed that following these recommendations generally -leads to greater code readability. Furthermore, as many people who code in -Python also follow these recommendations, adopting them reduces the gap between -your code and code written by others: this will save you some time when trying -to understand code. +the PEP8. It is a set of recommendations on coding style in Python. These +recommendations are of course not absolute rules and should be taken as advice. +However, I noticed that following these recommendations generally leads to +greater code readability. Moreover, as many people who code in Python also +follow these recommendations, adopting them reduces the gap between your code +and code written by others: this will save you some time when reading code. Here are a few points extracted from the PEP8: -- cgit v1.2.3-70-g09d2