index
:
ocr-layer-curation
master
refactor_align
[no description]
gitolite user
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Age
Commit message (
Collapse
)
Author
2013-08-17
Take line jumps into accounts when grouping words
Thibaut Horel
2013-08-17
Some tweaks
Thibaut Horel
2013-08-06
Split words which map to two words
Thibaut Horel
2013-08-06
Adding some comments
Thibaut Horel
2013-08-05
Use C implementation of the Levenshtein distance
Thibaut Horel
Requires the python-Levenshtein package on PyPI
2013-08-05
Use a Needleman-Wunsch type algorithm for text alignment
Thibaut Horel
2013-08-05
use new functions in compare.py
Guillaume Horel
2013-08-05
improve function to parse djvu files
Guillaume Horel
2013-08-04
script to extract djvutext from a document
Guillaume Horel
2013-08-04
Add some string utils functions
Thibaut Horel
Levenshtein distance and word hyphenation
2013-08-03
Fix html stripping
Thibaut Horel
2013-08-03
preliminary version of compare
Guillaume Horel
2013-08-03
remove unneeded enumerate
Guillaume Horel
2013-08-03
srcript to parse djvu xml
Guillaume Horel
2013-08-03
add test djvu xml file
Guillaume Horel
2013-08-03
Improve wikisource.py script
Thibaut Horel
2013-08-03
improve code logic
Guillaume Horel
2013-08-03
Add simple script to download text from Wikisource
Thibaut Horel
2013-08-03
working version of the parser
Guillaume Horel
2013-08-03
add inital text file and parser
Guillaume Horel
2013-08-03
Initial commit
Thibaut Horel