Introduction & Motivation
Deliverables
- 39,587 records related to postcards with image copyrights, along with their metadata, from the Europeana website.
- OCR results of a sample set of 350 images containing text.
- GPT-3.5 prediction results for a sample set of 350 images containing text, based on OCR results.
- A high-quality, manually annotated Ground Truth for a sample set of 309 images.
- GPT-3.5 prediction results for Ground Truth.
- GPT-4 prediction results for Ground Truth.
- An interactive webpage displaying the mapping of the postcards.
- The GitHub repository contains all the codes for the whole project.
Methodologies
Data collection
Result Assessment
Limitations & Future work
Projet plan & milestones
Timeframe
|
Task
|
Completion
|
Week 4
|
- Explore postcard search results on Europeana's website
- Study the Europeana API documentation and get an access key.
- Extract data of postcards using the Europeana API
|
✅
|
Week 5
|
- Clean data using metadata.
- Analyze the data of Europeana postcards
- Prepare sample image sets and explore prediction methods
|
✅
|
Week 6
|
- Decide to focus on postcards with text
- Test and evaluate the effectiveness of multiple OCR models
|
✅
|
Week 7
|
- Use OCR and NER for prediction
- Test and evaluate the effectiveness of multiple NER tools
- Explore alternative forecasting methods
|
✅
|
Week 8
|
- Introduce ChatGPT for the prediction(OCR+GPT-3.5+NER)
- Try to make predictions directly using GPT-4
|
✅
|
Week 9
|
- Optimize GPT-3.5 prompt for better results
- Compare the results of OCR + GPT-3.5 (optimized prompts) to those of GPT-4.
|
✅
|
Week 10
|
- Complete the pipeline for the entire prediction process
- Prepare a sample set to evaluate the effect
|
✅
|
Week 11
|
- Explore the visualization methods
- Refine the test set and analyze it
|
✅
|
Week 12
|
- Use the TA's annotation tool for building a ground truth
- Build the visualization platform
|
✅
|
Week 13
|
- Testing and refinement of the Web application
- Analyze the results of the test set evaluation
|
✅
|
Week 14
|
- Prepare the final report and presentation
|
✅
|
Github Repository
Europeana-mapping-postcards
References