aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2014-02-28pass the book name in the settings dictGuillaume Horel
2014-02-28fix html coordinatesGuillaume Horel
2014-02-28remove non needed filesGuillaume Horel
2014-02-28serve jpeg imagesGuillaume Horel
2014-02-28add a handler for images, everything is dynamic!Guillaume Horel
2014-02-28change defaults (otherwise image is upside down)Guillaume Horel
2014-02-27add function to return image from a bookGuillaume Horel
2014-02-27fix small bugGuillaume Horel
2014-02-27generate more useful html coordinatatesGuillaume Horel
2014-02-27cleanupGuillaume Horel
2014-02-27Add the possibility to specify list of pages to parse_bookThibaut Horel
2014-02-27Last simplificationThibaut Horel
2014-02-27Simplify parse_book even moreThibaut Horel
2014-02-27Simplify parse_book a bit, also making it more natural to useThibaut Horel
2014-02-27Remove useless int castingThibaut Horel
2014-02-27Adapting the web client code to the new behavior of parsedjvutextThibaut Horel
2014-02-27Cython implementation of SW algorithmThibaut Horel
2014-02-27PEP8Thibaut Horel
2014-02-27Merge branch 'master' of horel.org:thibaut/ocr-layer-curationThibaut Horel
cessary,
2014-02-27Basic tornado app displaying a page image and associated text side by sideThibaut Horel
2014-02-26fix the djvu parsing and add html coordinatesGuillaume Horel
2014-02-25using the djvu library for parsing djvu documentsGuillaume Horel
2014-02-24remove stray codeGuillaume Horel
2013-12-28clean up get_pages functionGuillaume Horel
2013-12-28fix download from wikisourceGuillaume Horel
2013-12-28Merge branch 'master' of horel.org:thibaut/ocr-layer-curationGuillaume Horel
2013-12-28add gitignore fileGuillaume Horel
2013-08-21small simplifactionGuillaume Horel
2013-08-21Simplify datastructureGuillaume Horel
An alignment is now a list of list. Empty list means word maps to nothing, and len(list) greater than one means a word maps to multiple words. This removes the artificial distinction between index and tuple.
2013-08-18try to fix the alignment_to_sexp functionGuillaume Horel
2013-08-17simplify Thibaut's codeGuillaume Horel
2013-08-17add function for converting alignment to sexpGuillaume Horel
2013-08-17Take line jumps into accounts when grouping wordsThibaut Horel
2013-08-17Some tweaksThibaut Horel
2013-08-06Split words which map to two wordsThibaut Horel
2013-08-06Adding some commentsThibaut Horel
2013-08-05Use C implementation of the Levenshtein distanceThibaut Horel
Requires the python-Levenshtein package on PyPI
2013-08-05Use a Needleman-Wunsch type algorithm for text alignmentThibaut Horel
2013-08-05use new functions in compare.pyGuillaume Horel
2013-08-05improve function to parse djvu filesGuillaume Horel
2013-08-04script to extract djvutext from a documentGuillaume Horel
2013-08-04Add some string utils functionsThibaut Horel
Levenshtein distance and word hyphenation
2013-08-03Fix html strippingThibaut Horel
2013-08-03preliminary version of compareGuillaume Horel
2013-08-03remove unneeded enumerateGuillaume Horel
2013-08-03srcript to parse djvu xmlGuillaume Horel
2013-08-03add test djvu xml fileGuillaume Horel
2013-08-03Improve wikisource.py scriptThibaut Horel
2013-08-03improve code logicGuillaume Horel
2013-08-03Add simple script to download text from WikisourceThibaut Horel