Extracting Toponyms from Maps of Jerusalem: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 130: Line 130:
==Limitations and Future Work==
==Limitations and Future Work==


One chief limitation of our project is the limited evidence pointing toward its generalizability. With only X ground truth labels developed from Y maps – maps chosen in part for their relatively high resolution – and marked inter-map variation in accuracy boosts, it is not evident that our pipeline would assuredly yield significant benefits across other historical maps of Jerusalem, not to mention other maps in general. Furthermore, defining our baseline as the accuracy of mapKurator’s pre-trained extraction module on historical maps of Jerusalem is problematic given that it was developed using a different map archive. A more appropriate evaluation regime would involve applying our pipeline to the same set of maps and labels the mapKurator team used to create and evaluate their model, thus expanding the number and diversity of ground truth labels, decoupling the marginal effect of our pipeline from the marginal effect of substituting the mapKurator’s map archive with another, and (ideally) making a stronger case for our pipeline’s generalizability. Though this evaluation regime is outside the scope of this Jerusalem-based project, we consider it to be the most important step toward the further validation of our approach.
One chief limitation of our project is the limited evidence pointing toward its generalizability. With only X ground truth labels developed from 3 maps – maps chosen in part for their relatively high resolution – and marked inter-map variation in accuracy boosts, it is not evident that our pipeline would assuredly yield significant benefits across other historical maps of Jerusalem, not to mention other maps in general. Furthermore, defining our baseline as the accuracy of mapKurator’s pre-trained extraction module on historical maps of Jerusalem is problematic given that it was developed using a different map archive. A more appropriate evaluation regime would involve applying our pipeline to the same set of maps and labels the mapKurator team used to create and evaluate their model, thus expanding the number and diversity of ground truth labels, decoupling the marginal effect of our pipeline from the marginal effect of substituting the mapKurator’s map archive with another, and (ideally) making a stronger case for our pipeline’s generalizability. Though this evaluation regime is outside the scope of this Jerusalem-based project, we consider it to be the most important step toward the further validation of our approach.


Certain stages of our pipeline could benefit from further tuning. Our spline-based approach toward single line sequence recovery does not accommodate cases where the centroid of two consecutive words is vertically mismatched (such as when a small, short word follows a larger word and both are bottom-justified, for example). One of our chosen maps uses a pervasive graphical element to represent trees that is consistently mistaken by the mapKurator spotter tool for a letter of text. Many such unresolved corner cases persist due to time limitations; addressing them would boost accuracy statistics. So too would developing further labels for font discriminator tool, which still misclassifies approximately X% of label pairs despite some Y manually labelled comparisons, likely due to imbalance in the number of matching fonts in the label set. In fact, given the narrow map coverage and imbalanced label set used to train the discriminator, it may still take considerable signal from the map the text is pulled from rather than the type of font used alone. Identifying and tackling each processing stage’s distinct imperfections is a necessary if time-intensive path toward making our pipeline production ready.
Certain stages of our pipeline could benefit from further tuning. Our spline-based approach toward single line sequence recovery does not accommodate cases where the centroid of two consecutive words is vertically mismatched (such as when a small, short word follows a larger word and both are bottom-justified, for example). One of our chosen maps uses a pervasive graphical element to represent trees that is consistently mistaken by the mapKurator spotter tool for a letter of text. Many such unresolved corner cases persist due to time limitations; addressing them would boost accuracy statistics. So too would developing further labels for font discriminator tool, which still misclassifies approximately X% of label pairs despite some Y manually labelled comparisons, likely due to imbalance in the number of matching fonts in the label set. In fact, given the narrow map coverage and imbalanced label set used to train the discriminator, it may still take considerable signal from the map the text is pulled from rather than the type of font used alone. Identifying and tackling each processing stage’s distinct imperfections is a necessary if time-intensive path toward making our pipeline production ready.

Revision as of 10:57, 20 December 2023

Introduction

We aim to programmatically and accurately extract toponyms (place names) from historical maps of Jerusalem. Building from mapKurator, a scene text recognition pipeline developed specifically for toponym extraction by the University of Minnesota's Knowledge Computing Lab, we develop a novel label extraction and processing pipeline capable of significant accuracy improvements relative to mapKurator's spotter module alone. We then explore the success of our pipeline during generalization, describe the limitations of our approach, and suggest possibilities for future progress in accuracy or end-user interactivity.

Motivation

A sample of the linguistic, geometrical, and typographical diversity of toponyms in 19th-century maps of Jerusalem.

Development of an accurate, accessible, and generally applicable tool for extraction of map toponyms is an unrealized goal. Indeed, despite the deep-learning induced boon to scene text recognition (STR) seen in the past decade, STR in general remains difficult due to the sheer variety of shapes, fonts, colors, languages, and text backgrounds encountered.[1] In the context of historical maps and Jerusalem especially, STR issues are compounded by the fact that there is not necessarily agreement between cartographers over toponym spellings (in the case of toponyms transliterated from arabic, for example) or, as is true among biblical toponyms, even the precise location of certain places[2]. Properly collecting toponym locations and spellings from historical maps of Old Jerusalem would thus permit researchers to understand how such ambiguities have manifested as agreement or disagreement in geographic depiction throughout time.

In an effort to automate text recognition on historical map archives - our exact use case - the University of Minnesota's Knowledge Computing Lab has developed an end-to-end toponym extraction and rectification pipeline dubbed mapKurator. [3] While we do build on their methods, we identify three issues with simply implementing mapKurator on historical maps of Jerusalem to achieve our intended result. First, some elements of the full mapKurator pipeline (namely the use of Open Street Map to correct extracted labels) are not suitable for historical maps of Jerusalem due to the ambiguities described above. Second, the success of the chief tool used to extract text in mapKurator depends heavily on its model weights - weights that users may not have the computational power or ground truth labels to train. Third, mapKurator's text spotting tool only extracts individual words, requiring further processing to capture multi-word toponyms. We therefore decide to augment mapKurator's text spotting tool with a pipeline involving repeated sampling and methods for recovering proper toponyms from said sampling.

The contributions of this project are thus inroads into the collection of accurate toponyms from historical maps of Jerusalem and a general, accessible pipeline that can be mounted on a pre-trained STR label extractor to improve accuracy.

Deliverables

  • A novel, general-use, and computationally accessible toponym extraction pipeline capable of improving accuracy for pre-trained SOTA neural network models.
  • A set of X ground truth toponyms across 3 historical maps of Jerusalem.
  • A set of X extracted toponyms across 3 historical maps of Jerusalem.
  • A set of X labels denoting font similarity across 3 images of toponyms.

These deliverables are available via the Git repository linked at the bottom of this page.

Methodology

MapKurator

Pyramid Scan

Extracted Label Processing

Introduction & Definitions

Subword Deduplication

Nested Word Flattening

Single-Line Sequence Recovery

Multi-Line Sequence Recovery

Single Line
Multiple Lines
Curved Line

Evaluation Metrics

Unmentioned thus far is how we develop our ground truth label set G. Using the online image annotating software VIA, we manually label X labels across Y maps. These Y maps were chosen for their high resolution, their preponderance of toponyms located directly over place locations (as opposed to inside legends), their visual diversity, and their linguistic diversity. Ground truth labels are annotated at the line level rather than the toponym level. For example, if "Church of" appears directly above "the Flagellation", we annotate each as a separate ground truth label - though we manually indicate that they belong to the same toponym by assigning a group id. Line-level annotations with the option for multiline combination allows us to report accuracy statistics at both the line- and multiline-level, shedding light on the power of our pipeline in different contexts.

To evaluate Ω for a given map, we first retain only those extracted and processed labels located in the same pixel space as a crop of the map which has all possible ground truth toponyms in G. We then calculate all pairwise geometrical IoUs between these subsets of Ω and G, find the 1-to-1 matched set that maximizes the sum of said IoUs, and divide this sum by the number of extracted labels (in the case of precision) or the number of ground truth labels (in the case of recall). Text-based accuracy statistics, meanwhile, report the sum of normalized Levenshtein edit distances from the 1-to-1 geometrically matched set divided by the pertinent number of labels.

Results

Kiepert 1845 (# ground truth labels)
Precision Recall
Geometric Textual Geometric Textual
Baseline
Pyramid + Processing


Saunders 1874 (# ground truth labels)
Precision Recall
Geometric Textual Geometric Textual
Baseline
Pyramid + Processing


Vandevelde 1846 (# ground truth labels)
Precision Recall
Geometric Textual Geometric Textual
Baseline
Pyramid + Processing

Limitations and Future Work

One chief limitation of our project is the limited evidence pointing toward its generalizability. With only X ground truth labels developed from 3 maps – maps chosen in part for their relatively high resolution – and marked inter-map variation in accuracy boosts, it is not evident that our pipeline would assuredly yield significant benefits across other historical maps of Jerusalem, not to mention other maps in general. Furthermore, defining our baseline as the accuracy of mapKurator’s pre-trained extraction module on historical maps of Jerusalem is problematic given that it was developed using a different map archive. A more appropriate evaluation regime would involve applying our pipeline to the same set of maps and labels the mapKurator team used to create and evaluate their model, thus expanding the number and diversity of ground truth labels, decoupling the marginal effect of our pipeline from the marginal effect of substituting the mapKurator’s map archive with another, and (ideally) making a stronger case for our pipeline’s generalizability. Though this evaluation regime is outside the scope of this Jerusalem-based project, we consider it to be the most important step toward the further validation of our approach.

Certain stages of our pipeline could benefit from further tuning. Our spline-based approach toward single line sequence recovery does not accommodate cases where the centroid of two consecutive words is vertically mismatched (such as when a small, short word follows a larger word and both are bottom-justified, for example). One of our chosen maps uses a pervasive graphical element to represent trees that is consistently mistaken by the mapKurator spotter tool for a letter of text. Many such unresolved corner cases persist due to time limitations; addressing them would boost accuracy statistics. So too would developing further labels for font discriminator tool, which still misclassifies approximately X% of label pairs despite some Y manually labelled comparisons, likely due to imbalance in the number of matching fonts in the label set. In fact, given the narrow map coverage and imbalanced label set used to train the discriminator, it may still take considerable signal from the map the text is pulled from rather than the type of font used alone. Identifying and tackling each processing stage’s distinct imperfections is a necessary if time-intensive path toward making our pipeline production ready.

Finally, we recognize that certain elements of the mapKurator pipeline which we disregard due to their inability to directly apply to our map archive would be beneficial if adapted for our project. We refer primarily to the PostOCR module, in which extracted text is mapped to verified toponyms from OpenStreetMap using a fuzzy query function (the issue with directly using this module is described in the Motivation section above). One can imagine emulating this process with a different set of “valid” toponyms: simply run the spotter module over a vast set of Jerusalem maps, apply our label processing pipeline, retain a set of the most prolific toponyms (or perhaps words) across all maps as the substitute for OpenStreetMap’s dictionary, and revisit all extracted toponyms to modify or at least annotate any which exhibit minor spelling changes from this novel dictionary. Such a process would invoke the same sampling justification behind the pyramid scanning regime to create a global, archive-wide dictionary in an unsupervised manner, create a lens through which to view potential spelling disagreements, and facilitate an environment in which our pipeline actually improves as more unlabeled data is processed.

We believe the pipeline we have produced to be useful and sound despite the limitations listed above, and are eager to see how resolving them might further improve the extraction of toponyms from historical maps.

Project Timeline

Timeframe Task Completion
Week 4
  • Finalize and present project proposals.
    • Toponym extraction project selected.
Week 5
  • Survey SOTA toponym extraction tools.
Week 6
  • Port MapKurator's Spotter tool and model weights into Windows-based Python.
  • Select two (later three) maps to use when implementing, evaluating, and fine-tuning MapKurator's model.
Week 7
  • Create ground truth labels for first map with VIA's online interface.
Week 8
  • Create ground truth labels for second map.
  • Implement 1:1-matched precision and recall via IoU (geometry) and normalized Levenshtein (text).
  • Calculate baseline accuracy statistics.
Week 9
  • Implement multi-layer pyramid application of MapKurator's Spotter.
Week 10
  • Create ground truth labels for third map.
  • Implement subword deduplication and nested word flattening on pyramid-derived toponyms.
Week 11
  • Calculate pyramid accuracy statistics.
  • Fine-tune subword deduplication and nested word flattening.
  • Deliver Midterm presentation.
Week 12
  • Launch Wiki.
  • Implement PCA on extracted label polygons to assist in font discriminator crops and Bezier splines.
  • Create manual labels for font discriminator.
  • Train font discriminator.
Week 13
  • Add ground truth labels to our three maps.
  • Implement Bezier spline
  • Implement iterative sequence recovery using font and Bezier scores
Week 14
  • Fine-tune Sequence Recovery.
  • Calculate final accuracy statistics.
  • Finalize Wiki and final presentation.
  • Postponed for future work:
    • Hierarchize final toponyms and develop Voronoi map.
    • Prototype toponym-disagreement visualizer.

Github Repository

Jerusalem Maps EPFL DH405

Acknowledgements

We thank Professor Frédéric Kaplan, Sven Najem-Meyer, and Beatrice Vaienti of the DHLAB for their valuable guidance over the course of this project.

References

Webpages

Chief Literature

  • Kim, Jina, et al. "The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps." arXiv preprint arXiv:2306.17059 (2023).
  • Li, Zekun, et al. "An automatic approach for generating rich, linked geo-metadata from historical map images." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020

Footnotes

  1. Liao, Minghui, et al. "Scene text recognition from two-dimensional perspective." Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019.
  2. Smith, George Adam. Jerusalem: The Topography, Economics and History from the Earliest Times to AD 70. Vol. 2. Cambridge University Press, 2013.
  3. Kim, Jina, et al. "The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps." arXiv preprint arXiv:2306.17059 (2023).