aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2014-08-03better 2-way highlighting, still not perfectGuillaume Horel
2014-08-03fix invert_alignGuillaume Horel
2014-08-01add function to inverse alignmentGuillaume Horel
2014-07-29Webapp now show three columnsGuillaume Horel
image, original text and corrected text. The highlighting is functional as well.
2014-03-12trying to build a djvu decoderGuillaume Horel
2014-03-01Preliminary support for corrected textGuillaume Horel
* It's slow, need to figure out how to load it in the background maybe * The bouding boxes could be improved
2014-03-01preliminary alignment_to_sexpGuillaume Horel
* right now just output a list of pairs (corrected_word, coords) * need to generate a sexp file if we want to reinsert into the djvu * bounding boxes are not smart at all right now (no merge or splits)
2014-03-01reorganize djvu_utils a bitGuillaume Horel
2014-02-28update with the new functionsGuillaume Horel
2014-02-28pass the book name in the settings dictGuillaume Horel
2014-02-28fix html coordinatesGuillaume Horel
2014-02-28remove non needed filesGuillaume Horel
2014-02-28serve jpeg imagesGuillaume Horel
2014-02-28add a handler for images, everything is dynamic!Guillaume Horel
2014-02-28change defaults (otherwise image is upside down)Guillaume Horel
2014-02-27add function to return image from a bookGuillaume Horel
2014-02-27fix small bugGuillaume Horel
2014-02-27generate more useful html coordinatatesGuillaume Horel
2014-02-27cleanupGuillaume Horel
2014-02-27Add the possibility to specify list of pages to parse_bookThibaut Horel
2014-02-27Last simplificationThibaut Horel
2014-02-27Simplify parse_book even moreThibaut Horel
2014-02-27Simplify parse_book a bit, also making it more natural to useThibaut Horel
2014-02-27Remove useless int castingThibaut Horel
2014-02-27Adapting the web client code to the new behavior of parsedjvutextThibaut Horel
2014-02-27Cython implementation of SW algorithmThibaut Horel
2014-02-27PEP8Thibaut Horel
2014-02-27Merge branch 'master' of horel.org:thibaut/ocr-layer-curationThibaut Horel
cessary,
2014-02-27Basic tornado app displaying a page image and associated text side by sideThibaut Horel
2014-02-26fix the djvu parsing and add html coordinatesGuillaume Horel
2014-02-25using the djvu library for parsing djvu documentsGuillaume Horel
2014-02-24remove stray codeGuillaume Horel
2013-12-28clean up get_pages functionGuillaume Horel
2013-12-28fix download from wikisourceGuillaume Horel
2013-12-28Merge branch 'master' of horel.org:thibaut/ocr-layer-curationGuillaume Horel
2013-12-28add gitignore fileGuillaume Horel
2013-08-21small simplifactionGuillaume Horel
2013-08-21Simplify datastructureGuillaume Horel
An alignment is now a list of list. Empty list means word maps to nothing, and len(list) greater than one means a word maps to multiple words. This removes the artificial distinction between index and tuple.
2013-08-18try to fix the alignment_to_sexp functionGuillaume Horel
2013-08-17simplify Thibaut's codeGuillaume Horel
2013-08-17add function for converting alignment to sexpGuillaume Horel
2013-08-17Take line jumps into accounts when grouping wordsThibaut Horel
2013-08-17Some tweaksThibaut Horel
2013-08-06Split words which map to two wordsThibaut Horel
2013-08-06Adding some commentsThibaut Horel
2013-08-05Use C implementation of the Levenshtein distanceThibaut Horel
Requires the python-Levenshtein package on PyPI
2013-08-05Use a Needleman-Wunsch type algorithm for text alignmentThibaut Horel
2013-08-05use new functions in compare.pyGuillaume Horel
2013-08-05improve function to parse djvu filesGuillaume Horel
2013-08-04script to extract djvutext from a documentGuillaume Horel