Europeana: A New Spatiotemporal Search Engine: Difference between revisions
Jump to navigation
Jump to search
Xingchen.li (talk | contribs) No edit summary |
|||
Line 21: | Line 21: | ||
* Decide to focus on text processing. | * Decide to focus on text processing. | ||
* Select a subset collection from the "Newspaper collection" of Europeana for our project. | * Select a subset collection from the "Newspaper collection" of Europeana for our project. | ||
* Check the content of "La clef du cabinet des princes de l'Europe" and | * Check the content of "La clef du cabinet des princes de l'Europe" and learn its structure and time span. | ||
| align="center" | ✓ | | align="center" | ✓ | ||
|- | |- | ||
Line 44: | Line 44: | ||
|By Week 8 | |By Week 8 | ||
| | | | ||
* Apply | * Apply OCRopus to a small set of images. | ||
* Use a grammar checker to analyze the result of OCRopus. | |||
| align="center" | ✓ | | align="center" | ✓ | ||
|- | |- | ||
Line 50: | Line 51: | ||
|By Week 9 | |By Week 9 | ||
| | | | ||
* Prototype design. | * Prototype design. | ||
* Database design. | * Database design. | ||
Line 58: | Line 58: | ||
|By Week 10 | |By Week 10 | ||
| | | | ||
* | * Get Europeana's API | ||
* Use the API to extract the URL for each page of our specific newspaper. | |||
| align="center" | | * Download each page of our specific newspaper as images using the URL we got. | ||
| align="center" | ✓ | |||
|- | |- | ||
|By Week 11 | |By Week 11 | ||
| | | | ||
* | * OCR using the better model and Kraken engine, | ||
| align="center" | | * Store the text we get in the database. | ||
* Share for a grammar checker to optimize the text we get. | |||
| align="center" | ✓ | |||
|- | |- | ||
|By Week 12 | |By Week 12 | ||
| | | | ||
* | * Use new selected grammar checker API to optimize the text. | ||
* Use entropy to analyze the result of the final text. | |||
| align="center" | | | align="center" | ✓ | ||
|- | |- | ||
|By Week 13 | |By Week 13 | ||
| | | | ||
* Build the web. | * Build the web from our prototype. | ||
| align="center" | | * Use different text analysis methods: LDA, n-gram, and name entity, to analyze the text | ||
| align="center" | ✓ | |||
|- | |- | ||
|By Week 14 | |By Week 14 | ||
| | | | ||
* Final report. | * Final report and presentation. | ||
| align="center" | | | align="center" | ✓ | ||
|- | |- | ||
Revision as of 19:45, 20 December 2022
Introduction
Motivation
Project Plan and Milestones
Date | Task | Completion |
---|---|---|
By Week 3 |
|
✓ |
By Week 5 |
|
✓ |
By Week 6 |
|
✓ |
By Week 7 |
|
✓ |
By Week 8 |
|
✓ |
By Week 9 |
|
✓ |
By Week 10 |
|
✓ |
By Week 11 |
|
✓ |
By Week 12 |
|
✓ |
By Week 13 |
|
✓ |
By Week 14 |
|
✓ |
Github Repository
https://github.com/XinyiDyee/Europeana-Search-Engine