Introduction
Motivation
Deliverables
Project Timeline & Milestones
Timeframe
|
Task
|
Completion
|
Week 4
|
- Exploring the dataset
- Exploring in-context learning models for text summarization
|
✓
|
Week 5
|
- Identify patterns and edge cases from the dataset (e.g missing fields, "odd" values)
- Define different summarization formats accordingly to be used for in-context learning
- Explore the connection between the Catastici and Sommarioni dataset
|
✓
|
Week 6
|
- Refine summarization formats
- Construct a pipeline connecting translation generation, summarization and validation
|
✓
|
Week 7
|
- Evaluate summarization results
|
|
Week 8
|
|
|
Week 9
|
|
|
Week 10
|
|
|
Week 11
|
|
Week 12
|
|
|
Week 13
|
|
|
Week 14
|
|
|
Methodology
Generating Summarization Formats for In-context Learning
Choosing NLP Model for Translation
Tried different SOTA models for translation --> Unsatisfactory results due to the nature of the language being in old time and specific to Venetian dialect. Preliminary results with 10 examples:
Original text
|
mBART
|
Google Translate
|
GPT-4
|
casa e bottega da barbier
|
and a barber pole
|
House and Bottega da Barbier
|
House and barber shop
|
casa
|
home
|
house
|
House
|
bottega da strazariol
|
a strawberry bottle
|
Bottega da Strazariol
|
Rag dealer's shop
|
casa e bottega da tentor
|
home and pushbutton
|
House and Bottega da Tentor
|
House and dyer’s shop
|
magazen
|
warehouse
|
magazen
|
Warehouse
|
mezà
|
Eight
|
mezà
|
Halfway house or mezzanine level
|
cas vuota
|
empty house
|
Cas empty
|
Empty house
|
casa a pepian
|
the pepian house
|
House in Pepian
|
House on the ground floor
|
bottega da confetti
|
Packaging bottle
|
Bottega da sugaredi
|
Confectioner’s shop
|
casa e bottega
|
home and doorbell
|
House and Bottega
|
House and shop
|
-> gonna choose GPT-4
Results
Limitations and further work
Conclusion
Credits