Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858): Difference between revisions
Line 80: | Line 80: | ||
! Task | ! Task | ||
! Completion | ! Completion | ||
|- | |||
| align="center" |Week 4 | |||
| | |||
* Exploring literature of NER | |||
* Finding textual data of the book | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 5 | |||
| | |||
* Pre-processing text | |||
* Quality assessment of the data | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 6 | |||
| | |||
* Applying NER using Spacy | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 7 | |||
| | |||
* Manually labelling chapter 3 | |||
* GPT-4 Prompt Engineering | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 8 | |||
| | |||
* Working on mapping with QGIS | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 9 | |||
| | |||
* Finalizing GPT-4 Prompt | |||
* Automating Wikipedia Page Search | |||
| align="center" | ✓ | |||
|- | |||
| align="center" |Week 10 | |||
| | |||
* Finalizing the list of manually detected locations | |||
* Evaluation of GPT-4 and Spacy Results for chapter 3 | |||
| align="center" | ✓ | |||
|- | |- | ||
| align="center" |Week 11 | | align="center" |Week 11 |
Revision as of 08:51, 17 December 2023
Abstract
Introduction
Delving into the pages of "Hadji in Syria: or, Three years in Jerusalem" by Sarah Barclay Johnson, this project sets out to digitally map the toponyms embedded in Johnson's 19th-century exploration of Jerusalem with the wish to connect the past and the present. By visualizing Johnson's recorded toponyms, this project aims to offer a dynamic tool for scholars and enthusiasts, contributing to the ongoing dialogue on the city's historical evolution.
This spatialization, in its attempt, pays homage to Johnson's literary contribution, serving as a digital window into the cultural crossroads: Jerusalem. The project invites users to engage with the city's history, fostering a deeper understanding of its rich heritage and the interconnected narratives that have shaped the city. In this fusion of literature, history, and technology, we hope to embark on a digital odyssey, weaving a narrative tapestry that transcends time and enriches our collective understanding of Jerusalem's intricate past.
Motivation
Delivarables
Methodology
Working with the Book / Extracting Book Information
Detecting Locations
NER with Spacy
//LIST or visual
So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms.
Difficulties when working with historical content
Too many mis-labels due to historical and biblical references place names changing over time multiple languages
Difficult to understand even by reading sometimes
Importance of understanding how places relate to the meaning in the book
GPT-4
Matching Wikipedia Pages
Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3
Manual detection
Spacy Results
GPT-4 Results
Since the GPT-4 Results outperformed the results of Spacy NER, presented GPT-4 prompt has been used to retrieve the locations in the book.
Tracking Author's Route on Maps
Finalizing the List of Coordinates
- Fuzzy matching GPT-4 results with an existing location list
- Retrieving coordinates by matched Wikipedia pages
Visualization by GeoPandas
- QGIS
- GeoPandas
Creating a Platform for Final Output
Results
Limitations and Further Work
- API for a smoother process - Improve with the full book instead of chapter by chapter
Conclusion
Project Timeline & Milestones
Timeframe | Task | Completion |
---|---|---|
Week 4 |
|
✓ |
Week 5 |
|
✓ |
Week 6 |
|
✓ |
Week 7 |
|
✓ |
Week 8 |
|
✓ |
Week 9 |
|
✓ |
Week 10 |
|
✓ |
Week 11 |
|
✓ |
Week 12 |
|
|
Week 13 |
|
|
Week 14 |
|
GitHub Repository
https://github.com/jiaming-jiang/FDH-G8.git