Introduction

Motivation

Project Plan and Milestones

Date	Task	Completion
By Week 3	Brainstorm projects ideas. Prepare slides for initial project idea presentation.	✓
By Week 5	Discuss the differences between image analysis and text analysis in terms of related algorithms, processing toolkits, implementation difficulties and display methods. Decide to focus on text processing. Select a subset collection from the "Newspaper collection" of Europeana for our project. Check the content of "La clef du cabinet des princes de l'Europe" and learn its structure and time span.	✓
By Week 6	Each of us read some pages of the journal to get an overall understanding of it. We find that the accuracy of the OCR results isn't very satisfying and decide to somehow improve the OCR results before text analyzing. Request for data.	✓
By Week 7	Research in OCR methods and find some OCR methods for Italian italics Get text by web analysis Use DeepL to translate FR to ENG, and then translate ENG to FR, finally check results Reproduce the OCR method from the literature and find that recognition has improved.	✓
By Week 8	Apply OCRopus to a small set of images. Use a grammar checker to analyze the result of OCRopus.	✓
By Week 9	Prototype design. Database design.	✓
By Week 10	Get Europeana's API Use the API to extract the URL for each page of our specific newspaper. Download each page of our specific newspaper as images using the URL we got.	✓
By Week 11	OCR using the better model and Kraken engine, Store the text we get in the database. Share for a grammar checker to optimize the text we get.	✓
By Week 12	Use new selected grammar checker API to optimize the text. Use entropy to analyze the result of the final text.	✓
By Week 13	Build the web from our prototype. Use different text analysis methods: LDA, n-gram, and name entity, to analyze the text	✓
By Week 14	Final report and presentation.	✓

Github Repository

https://github.com/XinyiDyee/Europeana-Search-Engine

Europeana: A New Spatiotemporal Search Engine

Contents

Introduction

Motivation

Project Plan and Milestones

Github Repository

Reference

Navigation menu