Deciphering Venetian handwriting
Jump to navigation
Jump to search
Introduction
The goal of this create a pipeline that allows to reestablish the mapping between the original Venetian castrate and the digital version of it as an excel spread sheet. Will producing the transcription of the the ,link to the original pages was lost. The purpose of this project is to take the spreadsheet and reestablish the link to the source document. To do this we use high quality scans of the Venetian cadastre called "Sommarioni".
Planning
Week | Task |
---|---|
09 | Segment patch of text in Sommarioni : (page id, patch) |
10 | Mapping transcription (excel file) -> page id (proof of concept) |
11 | Mapping transcription (excel file) -> page id (on the whole dataset) |
12 | Depending of the quality of the results : improve the mapping of page id, more precise matching, viewer web |
13 | Final results, final evaluation & final report writing |
14 | Final project presentation |
Week 09
- Input : Sommarioni images
- Output : Patch of pixels containing text with coordinate of the patch in the Sommarioni
- Step 1 : Segment hand written text regions in Sommarioni images
- Step 2 : Extraction of the patches
Week 10
- Input : transcription (Excel File), tuples (page id, patch) extracted in week 9
- Output : line in the transcription -> page id
- Step 1 : HTR recognition in the patch and cleaning : (patch, text)
- Step 2 : Find matching pair between recognized text and transcription
- Step 3 : New excel file with the new page id column
Week 11
- Step 1 : Apply the pipeline validated on week 10 on the whole dataset
- Step 2 : Evaluate the quality and based on that decide of the tasks for the next weeks
Week 12
- Depending of the quality of the matching
- Improve image segmentation
- More precise matching (excel cell) -> (page id, patch) in order to have the precise box of each written text
- Use a IIF image viewer to show the results of the project in a more fancy way