Revision as of 21:44, 11 December 2024

Introduction

Motivation

Deliverables

Project Timeline & Milestones

Timeframe	Task	Completion
Week 4	Exploring the dataset Exploring in-context learning models for text summarization	✓
Week 5	Identify patterns and edge cases from the dataset (e.g missing fields, "odd" values) Define different summarization formats accordingly to be used for in-context learning Explore the connection between the Catastici and Sommarioni dataset	✓
Week 6	Refine summarization formats Construct a pipeline connecting translation generation, summarization and validation	✓
Week 7	Evaluate summarization results
Week 8	Prepare for mid-term presentation
Week 9	Explore father-son relationship among Catastici and Sommarioni dataset
Week 10	Standardization of monthly rent column
Week 11	Verified and refined standardized rent values
Week 12	Added district mean rent column and integrated additional information into the data
Week 13	TBD
Week 14	TBD

Methodology

Generating Summarization Formats for In-context Learning

Functionality of Parcels

blablabla

Standardization of Monthly Rent

Inferring accurate values from rent data can offer valuable insights into a parcel's history, such as its price in relation to its neighborhood, which may reflect its relative status at that time. Among 35,946 data rows, we identified that only 28,610 (~80%) contained numeric values, with the remaining entries consisting of null values or text data.

Upon exploration, we encountered examples of text data such as: ```'1022 lire de piccoli' '3776 lire' '22 ducati, 22 grossi'

'10 ducati, 19 grossi' '40 ducati e 14 grossi' 'casa in soler'
'libertà di traghetto' '20 lire' '26 lire' '15 lire'...```

These examples illustrate the diversity of information present in the text entries, including multiple currencies, descriptive text about the parcel’s function, and potential typographical errors.

To standardize the non-numeric rent values, we developed an iterative approach. Using regular expressions (regex), we captured patterns within the data. After each iteration, we matched the patterns against the dataset, identified any emerging new patterns, and incorporated them into our existing pattern set. This process was repeated until no new patterns could be found.

In the final iteration, we identified six main patterns as follows:

Pattern Name	Example	Notes
Single currency	30 lire	with optional "de piccoli" or "di piccoli"
Dual currency	7 ducati, 18 grossi	with optional "de piccoli" or "di piccoli"
Three-part currency	10 ducati, 2 lire, 8 soldi
Fractional or "e mezzo" units	8 ducati e mezzo or 8 e mezzo
Time-related mentions	al mese, ogni tre mesi, per metà		Function or Ownership	casa, bottega

Results

Limitations and further work

@@ Line 122: / Line 122: @@
 |casa, bottega
 |
-}
+|}
 =Results=

Generation of Textual Description: Difference between revisions

Revision as of 21:44, 11 December 2024

Contents

Introduction

Motivation

Deliverables

Project Timeline & Milestones

Methodology

Generating Summarization Formats for In-context Learning

Functionality of Parcels

Standardization of Monthly Rent

Results

Limitations and further work

Conclusion

Credits

Navigation menu

Generation of Textual Description: Difference between revisions

Revision as of 21:44, 11 December 2024

Introduction

Motivation

Deliverables

Project Timeline & Milestones

Methodology

Generating Summarization Formats for In-context Learning

Functionality of Parcels

Standardization of Monthly Rent

Results

Limitations and further work

Conclusion

Credits

Navigation menu

Search