aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-08-21Simplify datastructureGuillaume Horel
An alignment is now a list of list. Empty list means word maps to nothing, and len(list) greater than one means a word maps to multiple words. This removes the artificial distinction between index and tuple.
2013-08-18try to fix the alignment_to_sexp functionGuillaume Horel
2013-08-17simplify Thibaut's codeGuillaume Horel
2013-08-17add function for converting alignment to sexpGuillaume Horel
2013-08-17Take line jumps into accounts when grouping wordsThibaut Horel
2013-08-17Some tweaksThibaut Horel
2013-08-06Split words which map to two wordsThibaut Horel
2013-08-06Adding some commentsThibaut Horel
2013-08-05Use C implementation of the Levenshtein distanceThibaut Horel
Requires the python-Levenshtein package on PyPI
2013-08-05Use a Needleman-Wunsch type algorithm for text alignmentThibaut Horel
2013-08-05use new functions in compare.pyGuillaume Horel
2013-08-05improve function to parse djvu filesGuillaume Horel
2013-08-04script to extract djvutext from a documentGuillaume Horel
2013-08-04Add some string utils functionsThibaut Horel
Levenshtein distance and word hyphenation
2013-08-03Fix html strippingThibaut Horel
2013-08-03preliminary version of compareGuillaume Horel
2013-08-03remove unneeded enumerateGuillaume Horel
2013-08-03srcript to parse djvu xmlGuillaume Horel
2013-08-03add test djvu xml fileGuillaume Horel
2013-08-03Improve wikisource.py scriptThibaut Horel
2013-08-03improve code logicGuillaume Horel
2013-08-03Add simple script to download text from WikisourceThibaut Horel
2013-08-03working version of the parserGuillaume Horel
2013-08-03add inital text file and parserGuillaume Horel
2013-08-03Initial commitThibaut Horel