index
:
ocr-layer-curation
master
refactor_align
[no description]
gitolite user
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Age
Commit message (
Collapse
)
Author
2014-02-27
cleanup
Guillaume Horel
2014-02-27
Add the possibility to specify list of pages to parse_book
Thibaut Horel
2014-02-27
Last simplification
Thibaut Horel
2014-02-27
Simplify parse_book even more
Thibaut Horel
2014-02-27
Simplify parse_book a bit, also making it more natural to use
Thibaut Horel
2014-02-27
Remove useless int casting
Thibaut Horel
2014-02-27
Adapting the web client code to the new behavior of parsedjvutext
Thibaut Horel
2014-02-27
Cython implementation of SW algorithm
Thibaut Horel
2014-02-27
PEP8
Thibaut Horel
2014-02-27
Merge branch 'master' of horel.org:thibaut/ocr-layer-curation
Thibaut Horel
cessary,
2014-02-27
Basic tornado app displaying a page image and associated text side by side
Thibaut Horel
2014-02-26
fix the djvu parsing and add html coordinates
Guillaume Horel
2014-02-25
using the djvu library for parsing djvu documents
Guillaume Horel
2014-02-24
remove stray code
Guillaume Horel
2013-12-28
clean up get_pages function
Guillaume Horel
2013-12-28
fix download from wikisource
Guillaume Horel
2013-12-28
Merge branch 'master' of horel.org:thibaut/ocr-layer-curation
Guillaume Horel
2013-12-28
add gitignore file
Guillaume Horel
2013-08-21
small simplifaction
Guillaume Horel
2013-08-21
Simplify datastructure
Guillaume Horel
An alignment is now a list of list. Empty list means word maps to nothing, and len(list) greater than one means a word maps to multiple words. This removes the artificial distinction between index and tuple.
2013-08-18
try to fix the alignment_to_sexp function
Guillaume Horel
2013-08-17
simplify Thibaut's code
Guillaume Horel
2013-08-17
add function for converting alignment to sexp
Guillaume Horel
2013-08-17
Take line jumps into accounts when grouping words
Thibaut Horel
2013-08-17
Some tweaks
Thibaut Horel
2013-08-06
Split words which map to two words
Thibaut Horel
2013-08-06
Adding some comments
Thibaut Horel
2013-08-05
Use C implementation of the Levenshtein distance
Thibaut Horel
Requires the python-Levenshtein package on PyPI
2013-08-05
Use a Needleman-Wunsch type algorithm for text alignment
Thibaut Horel
2013-08-05
use new functions in compare.py
Guillaume Horel
2013-08-05
improve function to parse djvu files
Guillaume Horel
2013-08-04
script to extract djvutext from a document
Guillaume Horel
2013-08-04
Add some string utils functions
Thibaut Horel
Levenshtein distance and word hyphenation
2013-08-03
Fix html stripping
Thibaut Horel
2013-08-03
preliminary version of compare
Guillaume Horel
2013-08-03
remove unneeded enumerate
Guillaume Horel
2013-08-03
srcript to parse djvu xml
Guillaume Horel
2013-08-03
add test djvu xml file
Guillaume Horel
2013-08-03
Improve wikisource.py script
Thibaut Horel
2013-08-03
improve code logic
Guillaume Horel
2013-08-03
Add simple script to download text from Wikisource
Thibaut Horel
2013-08-03
working version of the parser
Guillaume Horel
2013-08-03
add inital text file and parser
Guillaume Horel
2013-08-03
Initial commit
Thibaut Horel