Europeana: A New Spatiotemporal Search Engine

From FDHwiki
Jump to navigation Jump to search

Introduction

Motivation

Project Plan and Milestones

Date Task Completion
By Week 3
  • Brainstorm projects ideas.
  • Prepare slides for initial project idea presentation.
By Week 5
  • Discuss the differences between image analysis and text analysis in terms of related algorithms, processing toolkits, implementation difficulties and display methods.
  • Decide to focus on text processing.
  • Select a subset collection from the "Newspaper collection" of Europeana for our project.
  • Check the content of "La clef du cabinet des princes de l'Europe" and roughly select 3 topics we may focus on.
By Week 6
  • Each of us read some pages of the journal to get an overall understanding of it.
  • We find that the accuracy of the OCR results isn't very satisfying and decide to somehow improve the OCR results before text analyzing.
  • Request for data.
By Week 7
  • Research in OCR methods and find some OCR methods for Italian italics
  • Get text by web analysis
  • Use DeepL to translate FR to ENG, and then translate ENG to FR, finally check results
  • Reproduce the OCR method from the literature and find that recognition has improved.
By Week 8
  • Apply ocropus to a small set of images.
By Week 9
  • Preprocess the data. (ReOCR the images)
  • Prototype design.
  • Database design.
By Week 10
  • To be filled
By Week 11
  • Content analysis.
By Week 12
  • To be filled
By Week 13
  • Build the web.
By Week 14
  • Final report.

Github Repository

https://github.com/XinyiDyee/Europeana-Search-Engine

Reference