Revision as of 13:30, 12 December 2023

Project Timeline

Timeframe	Task	Completion
Week 11	Matching the coordinates of the locations from chapter 3 QGIS mapping of the locations from chapter 3	✓
Week 12	Visualizing the full chapter 3 journey Retrieving the locations from the entire book
Week 13	Matching the coordinates of the locations from the entire book QGIS Mapping of the locations from the entire book Visualizing the full journey
Week 14	Complete GitHub repository Complete Wiki page Complete presentation

Abstract

Introduction

Methodology

Working with the Book / Extracting Book Information

Detecting Locations

NER with Spacy

//LIST or visual

So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms.

Difficulties when working with historical content

Too many mis-labels due to historical and biblical references place names changing over time multiple languages

Difficult to understand even by reading sometimes

Importance of understanding how places relate to the meaning in the book

GPT-4

Matching Wikipedia Pages

Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3

Manual detection

Spacy Results

GPT-4 Results

Since the GPT-4 Results outperformed the results of Spacy NER, presented GPT-4 prompt has been used to retrieve the locations in the book.

@@ Line 43: / Line 43: @@
 == Working with the Book / Extracting Book Information ==
 == Detecting Locations ==
 === NER with Spacy ===
+//LIST or visual
+So, we can see the problem is mislabeling: in theory we only need to retrieve the toponyms, i.e. “GPE” & “LOC”, but SpaCy labelled some of them as “PERSON” or “ORG”. In other words, if we only select “GPE” and “LOC”, we’ll lose some toponyms; if we also select “ORG” and “PERSON”, we’ll get some non-toponyms.
+==== Difficulties when working with historical content ====
+Too many mis-labels due to
+historical and biblical references
+place names changing over time
+multiple languages
+Difficult to understand even by reading sometimes
+Importance of understanding how places relate to the meaning in the book
 === GPT-4 ===
 === Matching Wikipedia Pages ===
@@ Line 61: / Line 80: @@
 == Tracking Author's Route on Maps ==
-* Fuzzy matching GPT-4 results with an existing location list
+Adding coordinates to the JSON
+=== Fuzzy matching GPT-4 results with an existing location list===
 * QGIS
 * GeoPandas

Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858): Difference between revisions

Revision as of 13:30, 12 December 2023

Contents

Project Timeline

Abstract

Introduction

Methodology

Working with the Book / Extracting Book Information

Detecting Locations

NER with Spacy

Difficulties when working with historical content

GPT-4

Matching Wikipedia Pages

Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3

Manual detection

Spacy Results

GPT-4 Results

Tracking Author's Route on Maps

Fuzzy matching GPT-4 results with an existing location list

Results

Limitations and Further Work

Conclusion

Project Timeline & Milestones

GitHub Repository

References

Navigation menu

Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858): Difference between revisions

Revision as of 13:30, 12 December 2023

Project Timeline

Abstract

Introduction

Methodology

Working with the Book / Extracting Book Information

Detecting Locations

NER with Spacy

Difficulties when working with historical content

GPT-4

Matching Wikipedia Pages

Preliminary Analysis for Model Selection - Assessment Focusing on Chapter 3

Manual detection

Spacy Results

GPT-4 Results

Tracking Author's Route on Maps

Fuzzy matching GPT-4 results with an existing location list

Results

Limitations and Further Work

Conclusion

Project Timeline & Milestones

GitHub Repository

References

Navigation menu

Search