Generation of Textual Description for Parcels

Introduction

The historical records of land management, cadastre, and taxation provide invaluable insights into the socio-economic and administrative evolution of regions over time. Among the most significant resources for understanding such systems in ancient Venice, Catastici (1740) and Sommarioni (1808) are two books offer different perspectives on Venetian land parcels, their ownership, and their taxation structures.

The Catastici (1740)

The Sommarioni (1808), compiled during a time of significant political and social upheaval following the fall of the Venetian Republic and under Napoleonic administration, presents a transformed landscape.

Our goal is to combine data from two books of different periods to generate clear and comprehensive descriptions for each parcel.

Project Plan and Milestones

Workflow
Week	Task	Status
07.10 - 13.10	Define research questions Review relevant literature	Done
14.10 - 20.10	Perform initial data checking and cleaning Address dataset-related questions	Done
21.10 - 27.10	Autumn vacation	Done
28.10 - 03.11	Align Catastici and Sommarioni dataset Continue data cleaning	Done
04.11 - 10.11	Develop description templates and prompts Prepare for the midterm presentation	Done
11.11 - 17.11	Midterm presentation (14.11) Refine the description template and prompts	Done
18.11 - 24.11	Translate Italian data into English	Done
25.11 - 01.12	Design an evaluation plan Evaluate the prompts
02.12 - 08.12	Generate final results Evaluate the prompts and translation Begin writing the wikipage
09.12 - 15.12	Write the wikipage Organize GitHub code Prepare for the final presentation
16.12 - 22.12	Deliver GitHub repository and wikipage (18.12) Final presentation (19.12)

Methodology

Our project is based on the Catastici dataset (1740) and the Sommarioni dataset (1808). We cleaned and reorganized both datasets to generate descriptions for each Catastici point and Sommarioni parcel. We then linked the two descriptions based on their geographical locations and created a summary.

Pipeline

Data Preprocessing

Data Cleaning and Translation

For each Catastici point and Sommarioni parcel, we aim to generate descriptions that are accurate and concise, offering comprehensive and precise information about each location without introducing any fabricated data, while ensuring the content remains clear and fluent. To achieve this, we cleaned the datasets to address inconsistencies and errors. Additionally, since most of the data is in Italian, we translated it into English to facilitate better understanding.

The criteria for data cleaning include:

Content Selection

Content Ordering

Geographical Connection

To analyze the changes in ancient Venice between 1740 and 1808, we link the two datasets based on their geographical coordinates. The Catastici dataset comprises point data, while the Sommarioni dataset consists of polygon data. The connection between the two is established by identifying which Catastici points fall within specific Sommarioni parcels. Since not all Sommarioni parcels have corresponding Catastici data, and not all Catastici data matches a Sommarioni parcel, the summary description generation focuses solely on the overlapping portions—specifically, cases where Catastici data is contained within Sommarioni parcels.

Template Design

Our project utilizes GPT-4 to generate descriptions. To provide the language model with clear instructions and produce well-structured content, we divide the description into three paragraphs. The first paragraph introduces the information from Catastici (1740), the second paragraph details the data from Sommarioni (1808), and the third paragraph connects the two datasets, summarizing their relationship.

We aim for Paragraphs 1 and 2 to be accurate and concise, including all relevant information about the geometry parcels in the dataset while avoiding the inclusion of any fictional or irrelevant data. For Paragraph 3, our goal is to ensure plausibility by logically connecting the changes and similarities between the two descriptions, avoiding unreasonable or speculative elements, and adequately addressing implications or transitions inferred from the original data.

To ensure the language model effectively understands our instructions, we designed a specific prompt template for each paragraph. The final version of our templates are shown below and we will discuss the promotion of our prompt template in the next section.

Prompt Evaluation

Since the requirements and objectives of each paragraph differ, we designed separate evaluation metrics for each paragraph.

Evaluation Criteria for Catastici and Sommarioni Paragraph

Since the first two paragraphs primarily focus on the information in the dataset, we prioritize accuracy and conciseness as the main evaluation indicators. The detailed criteria are explained below.

Evaluate Description for Accuracy

For each description, assess whether:

All facts are correctly represented.
The description aligns precisely with the metadata, without misrepresentation or omission of key details.
There are no fabricated or incorrect elements.

Assign a score of:

0: If the description contains inaccuracies, misrepresentation, omissions, or fabricated elements.
1: If the description is fully accurate, complete, and faithfully represents the metadata.

Evaluate Description for Conciseness

For each description, assess whether:

The description is free from redundant or repetitive content.
The information is presented succinctly, without unnecessary elaboration or verbosity.
The description focuses on delivering the key details without including irrelevant information.

Assign a score of:

0: If the description contains redundancy, repetition, or excessive elaboration.
1: If the description is concise, focused, and avoids unnecessary content.

To evaluate the generated descriptions more thoroughly and comprehensively, we divide the content based on the original data into different aspects and assess the accuracy and conciseness of each aspect. The evaluation metrics are shown below.

Evaluation Metrics for Catastici
	Location	Features	Owner Name	Owner title & job	Tenant	Payment
Accurate	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1
Concise	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1

Evaluation Metrics for Catastici
	Location	Features	Owner and Ownership	Owner Family	Othernotes
Accurate	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1
Concise	0 or 1	0 or 1	0 or 1	0 or 1	0 or 1

Generation of Textual Description for Parcels

Contents

Introduction

Project Plan and Milestones

Methodology

Pipeline

Data Preprocessing

Data Cleaning and Translation

Content Selection

Content Ordering

Geographical Connection

Template Design

Prompt Evaluation

Evaluation Criteria for Catastici and Sommarioni Paragraph

Evaluation Criteria for Summary Paragraph

Evaluation Criteria for Translation

Results and Analysis

Information Paragraph Evalution

Catastici Paragraph

Sommarioni Paragraph

Summaries Evaluation

Translation Evaluation

Limitations and Future Works

Reference

Navigation menu

Generation of Textual Description for Parcels

Introduction

Project Plan and Milestones

Methodology

Pipeline

Data Preprocessing

Data Cleaning and Translation

Content Selection

Content Ordering

Geographical Connection

Template Design

Prompt Evaluation

Evaluation Criteria for Catastici and Sommarioni Paragraph

Evaluation Criteria for Summary Paragraph

Evaluation Criteria for Translation

Results and Analysis

Information Paragraph Evalution

Catastici Paragraph

Sommarioni Paragraph

Summaries Evaluation

Translation Evaluation

Limitations and Future Works

Reference

Navigation menu

Search