Primary sources: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
No edit summary
Line 2: Line 2:
When working with digitized documents, Optical Character Recognition (OCR) is
When working with digitized documents, Optical Character Recognition (OCR) is
traditionally used to recognized words character-by-character. However, in the
traditionally used to recognized words character-by-character. However, in the
case of offline handwritten text recognition, it does perform poorly. Thus the
case of offline handwritten text recognition, it does perform poorly. A more
current best technique is Word Spotting <ref>Rath, T. M., & Manmatha, R. (2007).
adapted technology to this kind of document is Word Spotting <ref>Rath, T. M., &
Word spotting for historical documents. International Journal on Document
Manmatha, R. (2007). Word spotting for historical documents. International
Analysis and Recognition, 9(2), 139-152.</ref>. Word Spotting does not recognize
Journal on Document Analysis and Recognition, 9(2), 139-152.</ref>. This
the words, but it characterizes them by their shape. Then a user can query
technique does not try to directly recognize the words, but it characterizes
either a word either an example (an image) and the system will return all
them by their shape. Then a user can query either a word either an example (an
matching occurrences. The current best results for Word Spotting using strings
image) and the system will return all matching occurrences.
as queries are obtained using Neural Networks, however they require a learning
 
set and a segmentation by word, which can also be source of errors <ref>Giotis,
The current best results for Word Spotting using strings as queries are obtained
A. P., Sfikas, G., Gatos, B., & Nikou, C. (2017). A survey of document image
using Neural Networks, however in order to obtain good results, they require a
word spotting techniques. Pattern Recognition, 68, 310-332.</ref>. Finally, new
learning set and a segmentation by word, which can also be source of errors
segmentation-free Word Spotting methods have appeared and look promising
<ref>Giotis, A. P., Sfikas, G., Gatos, B., & Nikou, C. (2017). A survey of
<ref>Wilkinson, T., Lindström, J., & Brun, A. (2017). Neural Ctrl-F:
document image word spotting techniques. Pattern Recognition, 68,
Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript
310-332.</ref>. Finally, new segmentation-free Word Spotting methods have
Collections. arXiv preprint arXiv:1703.07645.</ref>.
appeared and seem to show good results. <ref>Wilkinson, T., Lindström, J., &
Brun, A. (2017). Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting
in Handwritten Manuscript Collections. arXiv preprint arXiv:1703.07645.</ref>.


<references />
<references />

Revision as of 20:16, 2 November 2017

Bibliography and state of the art

When working with digitized documents, Optical Character Recognition (OCR) is traditionally used to recognized words character-by-character. However, in the case of offline handwritten text recognition, it does perform poorly. A more adapted technology to this kind of document is Word Spotting [1]. This technique does not try to directly recognize the words, but it characterizes them by their shape. Then a user can query either a word either an example (an image) and the system will return all matching occurrences.

The current best results for Word Spotting using strings as queries are obtained using Neural Networks, however in order to obtain good results, they require a learning set and a segmentation by word, which can also be source of errors [2]. Finally, new segmentation-free Word Spotting methods have appeared and seem to show good results. [3].

  1. Rath, T. M., & Manmatha, R. (2007). Word spotting for historical documents. International Journal on Document Analysis and Recognition, 9(2), 139-152.
  2. Giotis, A. P., Sfikas, G., Gatos, B., & Nikou, C. (2017). A survey of document image word spotting techniques. Pattern Recognition, 68, 310-332.
  3. Wilkinson, T., Lindström, J., & Brun, A. (2017). Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections. arXiv preprint arXiv:1703.07645.

Methods

Performances