Generation of Textual Description: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Introduction==
=Introduction=
==Motivation==
 
==Deliverables==
=Motivation=
 
=Deliverables=
 
= Project Timeline & Milestones =
= Project Timeline & Milestones =


Line 12: Line 15:
|
|
* Exploring the dataset
* Exploring the dataset
* Finding textual data of the book
* Exploring in-context learning models for text summarization
| align="center" | ✓
| align="center" | ✓
|-
|-
| align="center" |Week 5
| align="center" |Week 5
|
|
* Pre-processing text
* Identify patterns and edge cases from the dataset (e.g missing fields, "odd" values)
* Quality assessment of the data
* Define different summarization formats accordingly to be used for in-context learning
* Explore the connection between the Catastici and Sommarioni dataset
| align="center" | ✓
| align="center" | ✓
|-
|-
| align="center" |Week 6
| align="center" |Week 6
|
|
* Applying NER using Spacy
* Refine summarization formats
* Construct a pipeline connecting translation generation, summarization and validation
| align="center" | ✓
| align="center" | ✓
|-
|-
| align="center" |Week 7
| align="center" |Week 7
|
|
* Manually labelling chapter 3
* Evaluate summarization results
* GPT-4 Prompt Engineering
| align="center" |  
| align="center" |
|-
|-
| align="center" |Week 8
| align="center" |Week 8
|
|
* Working on mapping with QGIS
* TBD
| align="center" |
| align="center" |  
|-
|-
| align="center" |Week 9
| align="center" |Week 9
|
|
* Finalizing GPT-4 Prompt
* TBD
* Automating Wikipedia Page Search
| align="center" |  
| align="center" |
|-
|-
| align="center" |Week 10
| align="center" |Week 10
|
|
* Finalizing the list of manually detected locations
* TBD
* Evaluation of GPT-4 and Spacy Results for chapter 3
| align="center" |  
| align="center" |
|-
|-
| align="center" |Week 11
| align="center" |Week 11
|
|
* Matching the coordinates of the locations from chapter 3
* TBD
* QGIS mapping of the locations from chapter 3
| align="center" | ✓
|-
|-


| align="center" |Week 12
| align="center" |Week 12
|
|
* Visualizing the full chapter 3 journey
* TBD
* Retrieving the locations from the entire book
| align="center" |  
| align="center" |
|-
|-


| align="center" |Week 13
| align="center" |Week 13
|
|
* Matching the coordinates of the locations from the entire book
* TBD
* Retrieving coordinates from matched Wikipedia pages
| align="center" |  
* GeoPandas Mapping of the locations from the entire book
* Visualizing the full journey
| align="center" |
|-
|-


| align="center" |Week 14
| align="center" |Week 14
|
|
* Develop a platform to display outputs
* TBD
* Complete GitHub repository
| align="center" |
* Complete Wiki page
|-
* Complete presentation
|}
| align="center" |
 
=Methodology=
==Generating Summarization Formats for In-context Learning==
 
 
==Choosing NLP Model for Translation==
Tried different SOTA models for translation --> Unsatisfactory results due to the nature of the language being in old time and specific to Venetian dialect. Preliminary results with 10 examples:
 
{| class="wikitable" style="background-color:#ffffff; color:#383838;"
|-
! Original text
! mBART
! Google Translate
! GPT-4
|- style="vertical-align:middle;"
| style="background-color:#dae8fc;" | casa e bottega da barbier
| and a barber pole
| House and Bottega da Barbier
| House and barber shop
|-
| style="background-color:#dae8fc;" | casa
| style="vertical-align:middle;" | home
| house
| style="vertical-align:middle;" | House
|-
| style="background-color:#dae8fc;" | bottega da strazariol
| style="vertical-align:middle;" | a strawberry bottle
| Bottega da Strazariol
| style="vertical-align:middle;" | Rag dealer's shop
|-
| style="background-color:#dae8fc;" | casa e bottega da tentor
| style="vertical-align:middle;" | home and pushbutton
| House and Bottega da Tentor
| style="vertical-align:middle;" | House and dyer’s shop
|-
| style="background-color:#dae8fc;" | magazen
| style="vertical-align:middle;" | warehouse
| magazen
| style="vertical-align:middle;" | Warehouse
|-
| style="background-color:#dae8fc;" | mezà
| style="vertical-align:middle;" | Eight
| mezà
| style="vertical-align:middle;" | Halfway house or mezzanine level
|-
| style="background-color:#dae8fc;" | cas vuota
| style="vertical-align:middle;" | empty house
| Cas empty
| style="vertical-align:middle;" | Empty house
|-
| style="background-color:#dae8fc;" | casa a pepian
| style="vertical-align:middle;" | the pepian house
| House in Pepian
| style="vertical-align:middle;" | House on the ground floor
|-
| style="background-color:#dae8fc;" | bottega da confetti
| style="vertical-align:middle;" | Packaging bottle
| Bottega da sugaredi
| style="vertical-align:middle;" | Confectioner’s shop
|-
|-
| style="background-color:#dae8fc;" | casa e bottega
| style="vertical-align:middle;" | home and doorbell
| House and Bottega
| style="vertical-align:middle;" | House and shop
|}
|}
-> gonna choose GPT-4
=Results=
=Limitations and further work=
=Conclusion=


==Methodology==
=Credits=
==Results==
==Limitations and further work==
==Conclusion==
==Credits==

Latest revision as of 15:46, 2 November 2024

Introduction

Motivation

Deliverables

Project Timeline & Milestones

Timeframe Task Completion
Week 4
  • Exploring the dataset
  • Exploring in-context learning models for text summarization
Week 5
  • Identify patterns and edge cases from the dataset (e.g missing fields, "odd" values)
  • Define different summarization formats accordingly to be used for in-context learning
  • Explore the connection between the Catastici and Sommarioni dataset
Week 6
  • Refine summarization formats
  • Construct a pipeline connecting translation generation, summarization and validation
Week 7
  • Evaluate summarization results
Week 8
  • TBD
Week 9
  • TBD
Week 10
  • TBD
Week 11
  • TBD
Week 12
  • TBD
Week 13
  • TBD
Week 14
  • TBD

Methodology

Generating Summarization Formats for In-context Learning

Choosing NLP Model for Translation

Tried different SOTA models for translation --> Unsatisfactory results due to the nature of the language being in old time and specific to Venetian dialect. Preliminary results with 10 examples:

Original text mBART Google Translate GPT-4
casa e bottega da barbier and a barber pole House and Bottega da Barbier House and barber shop
casa home house House
bottega da strazariol a strawberry bottle Bottega da Strazariol Rag dealer's shop
casa e bottega da tentor home and pushbutton House and Bottega da Tentor House and dyer’s shop
magazen warehouse magazen Warehouse
mezà Eight mezà Halfway house or mezzanine level
cas vuota empty house Cas empty Empty house
casa a pepian the pepian house House in Pepian House on the ground floor
bottega da confetti Packaging bottle Bottega da sugaredi Confectioner’s shop
casa e bottega home and doorbell House and Bottega House and shop

-> gonna choose GPT-4

Results

Limitations and further work

Conclusion

Credits