Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858): Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 35: Line 35:
=== GPT-4 ===
=== GPT-4 ===


=== Matching Wikipedia Pages === 


=== Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3 ===
=== Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3 ===
Line 50: Line 48:


</blockquote>
</blockquote>
=== Matching Wikipedia Pages === 
Using the [https://www.mediawiki.org/wiki/API:Main_page Wikipedia API] , locations identified by GPT were searched in Wikipedia, and the first relevant result was recorded. Additionally, the first image found on the page of the recorded link was retrieved. This approach was primarily used to verify the accuracy of manually determined locations. Subsequently, after all locations were obtained, it was used both for visualizing the author's path and for acquiring coordinates for locations without coordinates.


== Tracking Author's Route on Maps ==
== Tracking Author's Route on Maps ==

Revision as of 09:50, 17 December 2023

Abstract

Introduction

Delving into the pages of "Hadji in Syria: or, Three years in Jerusalem" by Sarah Barclay Johnson, this project sets out to digitally map the toponyms embedded in Johnson's 19th-century exploration of Jerusalem with the wish to connect the past and the present. By visualizing Johnson's recorded toponyms, this project aims to offer a dynamic tool for scholars and enthusiasts, contributing to the ongoing dialogue on the city's historical evolution.

This spatialization, in its attempt, pays homage to Johnson's literary contribution, serving as a digital window into the cultural crossroads: Jerusalem. The project invites users to engage with the city's history, fostering a deeper understanding of its rich heritage and the interconnected narratives that have shaped the city. In this fusion of literature, history, and technology, we hope to embark on a digital odyssey, weaving a narrative tapestry that transcends time and enriches our collective understanding of Jerusalem's intricate past.

Motivation

Delivarables

Methodology

Working with the Book / Extracting Book Information

Detecting Locations

NER with Spacy

//LIST or visual

Spacy API

So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms.

Difficulties when working with historical content

Too many mis-labels due to historical and biblical references place names changing over time multiple languages

Difficult to understand even by reading sometimes

Importance of understanding how places relate to the meaning in the book

GPT-4

Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3

Manual detection

Spacy Results

GPT-4 Results

Since the GPT-4 Results outperformed the results of Spacy NER, presented GPT-4 prompt has been used to retrieve the locations in the book.


Matching Wikipedia Pages

Using the Wikipedia API , locations identified by GPT were searched in Wikipedia, and the first relevant result was recorded. Additionally, the first image found on the page of the recorded link was retrieved. This approach was primarily used to verify the accuracy of manually determined locations. Subsequently, after all locations were obtained, it was used both for visualizing the author's path and for acquiring coordinates for locations without coordinates.

Tracking Author's Route on Maps

Finalizing the List of Coordinates

  1. Fuzzy matching GPT-4 results with an existing location list
  1. Retrieving coordinates by matched Wikipedia pages

Visualization by GeoPandas

  1. QGIS
  2. GeoPandas

Creating a Platform for Final Output

Results

Limitations and Further Work

- API for a smoother process - Improve with the full book instead of chapter by chapter

Conclusion

Project Timeline & Milestones

Timeframe Task Completion
Week 4
  • Exploring literature of NER
  • Finding textual data of the book
Week 5
  • Pre-processing text
  • Quality assessment of the data
Week 6
  • Applying NER using Spacy
Week 7
  • Manually labelling chapter 3
  • GPT-4 Prompt Engineering
Week 8
  • Working on mapping with QGIS
Week 9
  • Finalizing GPT-4 Prompt
  • Automating Wikipedia Page Search
Week 10
  • Finalizing the list of manually detected locations
  • Evaluation of GPT-4 and Spacy Results for chapter 3
Week 11
  • Matching the coordinates of the locations from chapter 3
  • QGIS mapping of the locations from chapter 3
Week 12
  • Visualizing the full chapter 3 journey
  • Retrieving the locations from the entire book
Week 13
  • Matching the coordinates of the locations from the entire book
  • Retrieving coordinates from matched Wikipedia pages
  • QGIS Mapping of the locations from the entire book
  • Visualizing the full journey
Week 14
  • Complete GitHub repository
  • Complete Wiki page
  • Complete presentation

GitHub Repository

https://github.com/jiaming-jiang/FDH-G8.git

References