Projects
Projects 2018
- (1) Choose a Map on Gallica https://gallica.bnf.fr/accueil/?mode=desktop
- (2) Extract the maximum information out of it (Train a segmenter, Train a handwritten recognition system)
- (3) Export this information in other places in the Web or Build a website with specific services
Shortest-Path Route Extraction From City Map
This group will extract roads and intersections from a city map, assign coordinates to intersections, and build a web-interface that will allow a user to find the shortest-path route between two points.
Members: Jonathan and Florian
Train schedules
This project aims to observe how one could travel between Paris, Geneva and Marseille back in the middle of the 19th century in regards to now. From the journey paths, schedules and prices, one will realize how easy it is now to travel through Europe and how big the impact of the evolution of both the network and the technology has been. These data will be extracted from a document from 1858 by using modern tools such as handwritten recognition system. Eventually, a small program similar to the CFF app will be created so that one can put oneself in the shoes of a railway user of the 1850's.
Members: Anna Fredrikson and Olivier Dietrich
Recreating a cultural geography of Paris at the beginning of the XX century
A century in Beijing (Jimin & Anton)
Using this map by France's Service Géographique de l'Armée from 1900 we will follow the evolution of the urban landscape of the central part of China's capital at several milestones through the decline of Qing dynasty, the birth of a republic, the establishment communism to the modern times. The city grew vey quickly over the last husdred years and is now way past its old borders. While some things in the downtown remained pretty unchanged like the Tiananmen Imperial Palace, numerous old constructions were demolished in an effort to modernize the town.
The planned goals
- To align maps from different time periods and see how the landscape changed. The town's straightforward rectangular planning will allow us to make matches more easily.
- The map has a rich legend with toponymic information in French and the dated French system of transliteration of Mandarin. The plan is to extract and match these place names with their modern counterparts.
- Add the old pictures of significant buildings that are no longer there if it's possible to find them.
Coal supply in the German Empire
Main idea
- To study the coal supply of the German Empire for the year 1881.
- Interactive visualization of the main coal production and consumption centres.
- Visualization of dynamic coal transport flows according to the different mining basins and transport routes.
- Creation of a website to present the results.
Further possible upgrades
- Compare the supply of the time with an optimized computerized supply.
- Further analyze coal consumption data by city in relation to the main industries of the time.
- Observe the correlation of coal production and consumption at the time with the level of subsequent economic development of cities, in an attempt to quantify the economic impact of this strategic resource.
Members : Axel Matthey and Rémi Petitpierre
Paris Metropolitan, an evolution
This group will analyze the evolution of the Paris Metropolitan system from its inception. The group will look at the maps of the planning as well as the execution of the metro. The goal is to analyze how different areas of high population densities, due to cultural attractions, evolved around the metro stations - basically answering the chicken and the egg question. The group will also look at the impact of the metro system during catastrophic events such as wars.
The group will look at different cultural institutions and how they evolved hand in hand with the Paris Metro system.
The different maps selected for the project are the following:
Supplément au journal "le Temps" du 14 avril 1886. Chemin de fer métropolitain de Paris
Paris, chemin de fer métropolitain ; lignes en exploitation, 1908
Paris Nouveau plan de Paris avec toutes les lignes du Métropolitain et du Nord-Sud, 1915
Paris. Plan d'ensemble par arrondissements. Métropolitain : [vers 1950]
Members: Evgeniy Chervonenko and Valentine Bernasconi
Past projects
All the projects are pieces of a larger puzzle. The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.
The platform is called ClioWire
ClioWire: Platform management and development
This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses. The initial code base is Mastodon.
The group will write bots for rewritting pulses and progressively converging towards articulation/datafication of the pulses.
Knowledge required : Python, Javascript, basic linux administration.
Resp. Vincent and Orlin
- Albane
- Cédric
Platform management and development : State of art and Bibliography
Platform management and development : methodology
Platform management and development : Quantitative analysis of performance
GitHub page of the project : [1]
Secondary sources
The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses. This should consiste a de facto set of relevant information taking a large base of Venetian documents.
Resp. Giovanni / Matteo
- Hakim - Marion
GitHub page of the project : [2]
Primary sources
This group will look for named entities in digiitized manuscript and post pulses about these mentions.
- The group will use Wordspotting methods based on commercial algorithm. During the project, the group will have to set up a dedicated pipeline for indexing and searching the document digitized in the Venice Time Machine project and other primary sources using the software component provided.
- The group will have to search for list of names or regular expressions. A method based on predefined list will be compared with a recursive method based on the results provided by the Wordspotting components.
- Two types of Pulses will be produced : (a) "Mention of Francesco Raspi in document X" (b) "Franseco Raspi and Battista Nanni linked (document Y)"
- The creation of simple web Front end to test the Wordspotting algorithm would help assessing the quality of the method
Supervisor : Sofia
Skills : Java, simple Linux administration
- Raphael - Mathieu
Image banks
The goal is to transform the metadata of CINI which have been OCRed into pulses. One challenge is to deal with OCR errors and possible disambiguation.
Supervision: Lia
Newspaper, Wikipedia, Semantic Web
The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities. These sentences should be posted as pulses.
The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.
In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)
Resp. Maud
Skills: Python or Java
- Laurene and Santiago
Newspaper, Wikipedia, Semantic Web : State of art and Bibliography