Influencers of the past

From FDHwiki
Jump to navigation Jump to search

In this page, we will discuss and present our project Influencers of the past. Our goal is to show who were the notable people in Paris in 1888 and 1908 and where they lived. Here is the sketch of our project: Sketch of Influencers of the past

Abstract

Our expected output is a webpage showing both maps from 1884 and 1908, with clusters indicating the number of inhabitants per neighbourhood. The more you zoom, the more details you can see. You can click on a point to see more information about someone (i.e. his/her name). We will provide an analysis of the results.

Planing

Task Status Deadline
Extract the data Done (:
Clean the data Done (:
Get coordinates of the addresses Done 22.11.19
Georeference old maps Done 22.11.19
Display people on maps Done 29.11.19
Web interface and analysis In progress 06.11.19

Main steps

Extracting the data from the directories

Our first step is to extract all the names and adresses from the two directories. To do so, we use Transkribus to get the OCR and then start to parse the informations.

Cleaning the data

This is the principal step in our project. The data the OCR gives us is quite messy, there are a lot of errors and we definetely need to correct them to hope obtaining the geocoordinates of our addresses. We also need to harmonise our results. For instance, we want to consider in the same way 'r.' and 'rue' (the French name for 'street') or 'bd' and 'boulevard'. Having all our addresses in a stardardized form is also helpful to easily retrieve the corresponding geocoordinates. The principal challenge of this step, is that we have two different OCRs for the two years (1884 and 1908). We thus had to implement two specific parsers.

Finding the geolocation of the adresses

To be able to show the adresses on the map, we need to find their geolocation (latitude/longitude coordinates). For this step, we have proceded in two steps. First we have used the list of addresses of Paris created by the DHLab. This database provides a list of old Paris addresses with the start and ending date (if known) and the geocoordinates (latitude and longitude, directly in the format EPSG:3857 handled by Leaflet). This first step has given us ADD PERCENTAGE % of our addresses. To complete our database, we then used the GeoPy API [1]. This API simply takes our remaining addresses and gives back the geocoordinates. With this second step, we have managed to geolocalise 92% of our addresses.

Georeference old maps of Paris

Once we have the geocoordinates of our addresses we need to georeference old maps of Paris. To do so we Georeferencer. Through the localisation of homologuous points between the old map and the present map, this tool allows to project geocoordinates on the old map. This can then be used with Leaflet and the Python module Folium [2] to visualise our results.

Visualise results

Once we have all our elements we can start visualise our results. The naive way would be to simply put all our addresses on the map but due to the large number of addresses we have (approximately 10000) this would result in a overcrowded map (and it would also be very slow). Our first idea is therefore to cluster our addresses when they are near each other. This will allow, at low level zoom, to visualise 'influential' neighbourhoods for instance. Then, when one starts to zoom more on the map, he will eventually reach a level where each person is shown as a dot. In this last case, when one clicks on the dot, a pop-up with additional information on the person (such as the name) will show up.

References

  1. GeoPy Contributors, "GeoPy Documentation", 26/05/2019
  2. "Folium documentation"