Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858): Difference between revisions
Jump to navigation
Jump to search
Line 43: | Line 43: | ||
== Working with the Book / Extracting Book Information == | == Working with the Book / Extracting Book Information == | ||
== Detecting Locations == | == Detecting Locations == | ||
=== NER with Spacy === | === NER with Spacy === | ||
//LIST or visual | |||
So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms. | |||
==== Difficulties when working with historical content ==== | |||
Too many mis-labels due to | |||
historical and biblical references | |||
place names changing over time | |||
multiple languages | |||
Difficult to understand even by reading sometimes | |||
Importance of understanding how places relate to the meaning in the book | |||
=== GPT-4 === | === GPT-4 === | ||
=== Matching Wikipedia Pages === | === Matching Wikipedia Pages === | ||
Line 61: | Line 80: | ||
== Tracking Author's Route on Maps == | == Tracking Author's Route on Maps == | ||
Adding coordinates to the JSON | |||
=== Fuzzy matching GPT-4 results with an existing location list=== | |||
* QGIS | * QGIS | ||
* GeoPandas | * GeoPandas |
Revision as of 13:30, 12 December 2023
Project Timeline
Timeframe | Task | Completion |
---|---|---|
Week 11 |
|
✓ |
Week 12 |
|
|
Week 13 |
|
|
Week 14 |
|
Abstract
Introduction
Methodology
Working with the Book / Extracting Book Information
Detecting Locations
NER with Spacy
//LIST or visual
So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms.
Difficulties when working with historical content
Too many mis-labels due to historical and biblical references place names changing over time multiple languages
Difficult to understand even by reading sometimes
Importance of understanding how places relate to the meaning in the book
GPT-4
Matching Wikipedia Pages
Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3
Manual detection
Spacy Results
GPT-4 Results
Since the GPT-4 Results outperformed the results of Spacy NER, presented GPT-4 prompt has been used to retrieve the locations in the book.
Tracking Author's Route on Maps
Adding coordinates to the JSON
Fuzzy matching GPT-4 results with an existing location list
- QGIS
- GeoPandas