Introduction
Motivation
Project Plan and Milestones
Date
|
Task
|
Completion
|
By Week 3
|
- Brainstorm projects ideas.
- Prepare slides for initial project idea presentation.
|
✓
|
By Week 5
|
- Discuss the differences between image analysis and text analysis in terms of related algorithms, processing toolkits, implementation difficulties and display methods.
- Decide to focus on text processing.
- Select a subset collection from the "Newspaper collection" of Europeana for our project.
- Check the content of "La clef du cabinet des princes de l'Europe" and roughly select 3 topics we may focus on.
|
✓
|
By Week 6
|
- Each of us read some pages of the journal to get an overall understanding of it.
- We find that the accuracy of the OCR results isn't very satisfying and decide to somehow improve the OCR results before text analyzing.
- Request for data.
|
✓
|
By Week 7
|
- Research in OCR methods and find some OCR methods for Italian italics
- Get text by web analysis
- Use DeepL to translate FR to ENG, and then translate ENG to FR, finally check results
- Reproduce the OCR method from the literature and find that recognition has improved.
|
✓
|
By Week 8
|
- Apply ocropus to a small set of images.
|
✓
|
By Week 9
|
- Preprocess the data. (ReOCR the images)
- Prototype design.
- Database design.
|
✓
|
By Week 10
|
|
|
By Week 11
|
|
|
By Week 12
|
|
|
By Week 13
|
|
|
By Week 14
|
|
|
Github Repository
https://github.com/XinyiDyee/Europeana-Search-Engine
Reference