Influencers of the past: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 8: Line 8:
Our first step is to extract all the names and adresses from the two directories. To do so, we use Transkribus to get the OCR and then start to parse the informations.
Our first step is to extract all the names and adresses from the two directories. To do so, we use Transkribus to get the OCR and then start to parse the informations.


== Finding the geolocation of the adresses ==  
== Cleaning the data ==


To be able to show the adresses on the map, we need to find their geolocation (latitude/longitude coordinates).
This is the principal step in our project. The data the OCR gives us is quite messy, there are a lot of errors and we definetely need to correct them to hope obtaining the geocoordinates of our addresses. We also need to harmonise our results. For instance, we want to consider in the same way 'r.' and 'rue' (the French name for 'street') or 'bd' and 'boulevard'. Having all our addresses in a stardardized form is also helpful to easily retrieve the corresponding geocoordinates.
 
== Finding the geolocation of the adresses ==
 
To be able to show the adresses on the map, we need to find their geolocation (latitude/longitude coordinates). For this step, we have proceded in two steps. First we have used the [[http://fdh.epfl.ch/index.php/Lists_of_addresses_of_Paris|list of addresses of Paris]] created by the DHLab. This database provides a list of old Paris addresses with the start and ending date (if known) and the geocoordinates (latitude and longitude, directly in the format [[https://en.wikipedia.org/wiki/Web_Mercator_projection|EPSG:3857]] handled by Leaflet). This first step has given us ADD PERCENTAGE % of our addresses.
To complete our database, we then used the GeoPy API <ref>GeoPy Contributors, [https://buildmedia.readthedocs.org/media/pdf/geopy/stable/geopy.pdf "GeoPy Documentation"], 26/05/2019</ref>. This API simply takes our remaining addresses and gives back the geocoordinates. With this second step, we have managed to geolocalise ADD PERCENTAGE % of our addresses.
 
 
 
== Georeference old maps of Paris ==
 
== Visualise results ==

Revision as of 07:48, 21 November 2019

In this page, we will discuss and present our project Influencers of the past. Our goal is to show who were the notable people in Paris in 1888 and 1908 and where they lived. Here is the sketch of our project: Sketch of Influencers of the past

Main steps

Extracting the data from the directories

Our first step is to extract all the names and adresses from the two directories. To do so, we use Transkribus to get the OCR and then start to parse the informations.

Cleaning the data

This is the principal step in our project. The data the OCR gives us is quite messy, there are a lot of errors and we definetely need to correct them to hope obtaining the geocoordinates of our addresses. We also need to harmonise our results. For instance, we want to consider in the same way 'r.' and 'rue' (the French name for 'street') or 'bd' and 'boulevard'. Having all our addresses in a stardardized form is also helpful to easily retrieve the corresponding geocoordinates.

Finding the geolocation of the adresses

To be able to show the adresses on the map, we need to find their geolocation (latitude/longitude coordinates). For this step, we have proceded in two steps. First we have used the [of addresses of Paris] created by the DHLab. This database provides a list of old Paris addresses with the start and ending date (if known) and the geocoordinates (latitude and longitude, directly in the format [[1]] handled by Leaflet). This first step has given us ADD PERCENTAGE % of our addresses. To complete our database, we then used the GeoPy API [1]. This API simply takes our remaining addresses and gives back the geocoordinates. With this second step, we have managed to geolocalise ADD PERCENTAGE % of our addresses.


Georeference old maps of Paris

Visualise results

  1. GeoPy Contributors, "GeoPy Documentation", 26/05/2019