Generation of Textual Description: Difference between revisions
Line 119: | Line 119: | ||
|al mese, ogni tre mesi, per metà | |al mese, ogni tre mesi, per metà | ||
| | | | ||
|- | |||
|Function or Ownership | |Function or Ownership | ||
|casa, bottega | |casa, bottega |
Revision as of 21:44, 11 December 2024
Introduction
Motivation
Deliverables
Project Timeline & Milestones
Timeframe | Task | Completion |
---|---|---|
Week 4 |
|
✓ |
Week 5 |
|
✓ |
Week 6 |
|
✓ |
Week 7 |
|
|
Week 8 |
|
|
Week 9 |
|
|
Week 10 |
|
|
Week 11 |
| |
Week 12 |
|
|
Week 13 |
|
|
Week 14 |
|
Methodology
Generating Summarization Formats for In-context Learning
Functionality of Parcels
blablabla
Standardization of Monthly Rent
Inferring accurate values from rent data can offer valuable insights into a parcel's history, such as its price in relation to its neighborhood, which may reflect its relative status at that time. Among 35,946 data rows, we identified that only 28,610 (~80%) contained numeric values, with the remaining entries consisting of null values or text data.
Upon exploration, we encountered examples of text data such as: ```'1022 lire de piccoli' '3776 lire' '22 ducati, 22 grossi'
'10 ducati, 19 grossi' '40 ducati e 14 grossi' 'casa in soler' 'libertà di traghetto' '20 lire' '26 lire' '15 lire'...```
These examples illustrate the diversity of information present in the text entries, including multiple currencies, descriptive text about the parcel’s function, and potential typographical errors.
To standardize the non-numeric rent values, we developed an iterative approach. Using regular expressions (regex), we captured patterns within the data. After each iteration, we matched the patterns against the dataset, identified any emerging new patterns, and incorporated them into our existing pattern set. This process was repeated until no new patterns could be found.
In the final iteration, we identified six main patterns as follows:
Pattern Name | Example | Notes |
---|---|---|
Single currency | 30 lire | with optional "de piccoli" or "di piccoli" |
Dual currency | 7 ducati, 18 grossi | with optional "de piccoli" or "di piccoli" |
Three-part currency | 10 ducati, 2 lire, 8 soldi | |
Fractional or "e mezzo" units | 8 ducati e mezzo or 8 e mezzo | |
Time-related mentions | al mese, ogni tre mesi, per metà | |
Function or Ownership | casa, bottega |