Primary sources: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
(Created page with "== Bibliography and state of the art == == Methods == == Performances ==")
 
Line 1: Line 1:
== Bibliography and state of the art ==
== Bibliography and state of the art ==
When working with digitized documents, Optical Character Recognition (OCR) is
traditionally used to recognized words character-by-character. However, in the
case of offline handwritten text recognition, it does perform poorly. Thus the
current best technique is Word Spotting <ref>Rath, T. M., & Manmatha, R. (2007).
Word spotting for historical documents. International Journal on Document
Analysis and Recognition, 9(2), 139-152.</ref>. Word Spotting does not recognize
the words, but it characterizes them by their shape. Then a user can query
either a word either an example (an image) and the system will return all
matching occurrences. The current best results for Word Spotting using strings
as queries are obtained using Neural Networks, however they require a learning
set and a segmentation by word, which can also be source of errors <ref>Giotis,
A. P., Sfikas, G., Gatos, B., & Nikou, C. (2017). A survey of document image
word spotting techniques. Pattern Recognition, 68, 310-332.</ref>. Finally, new
segmentation-free Word Spotting methods have appeared and look promising
<ref>Wilkinson, T., Lindström, J., & Brun, A. (2017). Neural Ctrl-F:
Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript
Collections. arXiv preprint arXiv:1703.07645.</ref>.
<references />
== Methods ==
== Methods ==
== Performances ==
== Performances ==

Revision as of 17:41, 2 November 2017

Bibliography and state of the art

When working with digitized documents, Optical Character Recognition (OCR) is traditionally used to recognized words character-by-character. However, in the case of offline handwritten text recognition, it does perform poorly. Thus the current best technique is Word Spotting [1]. Word Spotting does not recognize the words, but it characterizes them by their shape. Then a user can query either a word either an example (an image) and the system will return all matching occurrences. The current best results for Word Spotting using strings as queries are obtained using Neural Networks, however they require a learning set and a segmentation by word, which can also be source of errors [2]. Finally, new segmentation-free Word Spotting methods have appeared and look promising [3].

  1. Rath, T. M., & Manmatha, R. (2007). Word spotting for historical documents. International Journal on Document Analysis and Recognition, 9(2), 139-152.
  2. Giotis, A. P., Sfikas, G., Gatos, B., & Nikou, C. (2017). A survey of document image word spotting techniques. Pattern Recognition, 68, 310-332.
  3. Wilkinson, T., Lindström, J., & Brun, A. (2017). Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections. arXiv preprint arXiv:1703.07645.

Methods

Performances