| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 2014-02-24 | remove stray code | Guillaume Horel | |
| 2013-12-28 | clean up get_pages function | Guillaume Horel | |
| 2013-12-28 | fix download from wikisource | Guillaume Horel | |
| 2013-12-28 | Merge branch 'master' of horel.org:thibaut/ocr-layer-curation | Guillaume Horel | |
| 2013-12-28 | add gitignore file | Guillaume Horel | |
| 2013-08-21 | small simplifaction | Guillaume Horel | |
| 2013-08-21 | Simplify datastructure | Guillaume Horel | |
| An alignment is now a list of list. Empty list means word maps to nothing, and len(list) greater than one means a word maps to multiple words. This removes the artificial distinction between index and tuple. | |||
| 2013-08-18 | try to fix the alignment_to_sexp function | Guillaume Horel | |
| 2013-08-17 | simplify Thibaut's code | Guillaume Horel | |
| 2013-08-17 | add function for converting alignment to sexp | Guillaume Horel | |
| 2013-08-17 | Take line jumps into accounts when grouping words | Thibaut Horel | |
| 2013-08-17 | Some tweaks | Thibaut Horel | |
| 2013-08-06 | Split words which map to two words | Thibaut Horel | |
| 2013-08-06 | Adding some comments | Thibaut Horel | |
| 2013-08-05 | Use C implementation of the Levenshtein distance | Thibaut Horel | |
| Requires the python-Levenshtein package on PyPI | |||
| 2013-08-05 | Use a Needleman-Wunsch type algorithm for text alignment | Thibaut Horel | |
| 2013-08-05 | use new functions in compare.py | Guillaume Horel | |
| 2013-08-05 | improve function to parse djvu files | Guillaume Horel | |
| 2013-08-04 | script to extract djvutext from a document | Guillaume Horel | |
| 2013-08-04 | Add some string utils functions | Thibaut Horel | |
| Levenshtein distance and word hyphenation | |||
| 2013-08-03 | Fix html stripping | Thibaut Horel | |
| 2013-08-03 | preliminary version of compare | Guillaume Horel | |
| 2013-08-03 | remove unneeded enumerate | Guillaume Horel | |
| 2013-08-03 | srcript to parse djvu xml | Guillaume Horel | |
| 2013-08-03 | add test djvu xml file | Guillaume Horel | |
| 2013-08-03 | Improve wikisource.py script | Thibaut Horel | |
| 2013-08-03 | improve code logic | Guillaume Horel | |
| 2013-08-03 | Add simple script to download text from Wikisource | Thibaut Horel | |
| 2013-08-03 | working version of the parser | Guillaume Horel | |
| 2013-08-03 | add inital text file and parser | Guillaume Horel | |
| 2013-08-03 | Initial commit | Thibaut Horel | |
