Generation of Textual Description for Parcels
Introduction
The historical records of land management, cadastre, and taxation provide invaluable insights into the socio-economic and administrative evolution of regions over time. Among the most significant resources for understanding such systems in ancient Venice, Catastici (1740) and Sommarioni (1808) are two books offer different perspectives on Venetian land parcels, their ownership, and their taxation structures.
The Catastici (1740)
The Sommarioni (1808), compiled during a time of significant political and social upheaval following the fall of the Venetian Republic and under Napoleonic administration, presents a transformed landscape.
Our goal is to combine data from two books of different periods to generate clear and comprehensive descriptions for each parcel.
Project Plan and Milestones
Week | Task | Status |
---|---|---|
07.10 - 13.10 | Define research questions Review relevant literature |
Done |
14.10 - 20.10 | Perform initial data checking and cleaning Address dataset-related questions |
Done |
21.10 - 27.10 | Autumn vacation | Done |
28.10 - 03.11 | Align Catastici and Sommarioni dataset Continue data cleaning |
Done |
04.11 - 10.11 | Develop description templates and prompts Prepare for the midterm presentation |
Done |
11.11 - 17.11 | Midterm presentation (14.11) Refine the description template and prompts |
Done |
18.11 - 24.11 | Translate Italian data into English |
Done |
25.11 - 01.12 | Design an evaluation plan Evaluate the prompts |
|
02.12 - 08.12 | Generate final results Evaluate the prompts and translation Begin writing the wikipage |
|
09.12 - 15.12 | Write the wikipage Organize GitHub code Prepare for the final presentation |
|
16.12 - 22.12 | Deliver GitHub repository and wikipage (18.12) Final presentation (19.12) |
Methodology
Our project is based on the Catastici dataset (1740) and the Sommarioni dataset (1808). We cleaned and reorganized both datasets to generate descriptions for each Catastici point and Sommarioni parcel. We then linked the two descriptions based on their geographical locations and created a summary.
Pipeline
Data Preprocessing
Data Cleaning and Translation
For each Catastici point and Sommarioni parcel, we aim to generate descriptions that are accurate and concise, offering comprehensive and precise information about each location without introducing any fabricated data, while ensuring the content remains clear and fluent. To achieve this, we cleaned the datasets to address inconsistencies and errors. Additionally, since most of the data is in Italian, we translated it into English to facilitate better understanding.
The criteria for data cleaning include: