Train schedules
Introduction
This project aims to observe how a railway user of the middle of the 19th century could travel between some cities such as Paris, Lyon, Geneva or Marseille in regards to nowadays. Several aspects of the trip have been looked into, the journey path, the schedules of the trains and the price of the tickets. All of these informations come from a single document that contains both a timetable and a map of the railway network and have been computed into a website, so one can look for a trip and find easily all the informations needed. This whole project puts into perspective how the evolution of transports, in this particular example trains, have made the world much more connected and travelling much easier.
The document
This project will focus on this old train schedules from 1858. It has been found in the Bibliothèque Nationale de France's (BNF) digital library, Gallica. This document belongs to BNF's department of maps and is in free access, as all the document presented in Gallica.
It has been made by Alfred Potiquet (1820-1883), a civil engineer known for being responsible of the first stamp catalogue in the world, in 1861. Its exact title is Chemin de fer de Lyon à Genève. Marche des trains depuis le 10 novembre 1858. Correspondances avec les chemins de fer de France, de Suisse & d'Italie. As the title implies, it contains the complete timetable for train between Lyon and Geneva, but also the connections between Swiss, French and Italian trains. It has been decided to focus the project on this last aspect, since it is more relevant for us to know how long it takes to go from Paris to Geneva than from Montluel to Bellegarde for example.
This document shows different possible paths between "important" cities, with detailed stops when the train is between Mâcon, Geneva and Lyon, as seen in the picture zoom on detailed path. For each of this path, one can read the prices for each class (first, second and third), the distance between the destinations ad the different departure and arrival times. Some mental maths were required, as the prices and distances were always shown in regard to the first city of the path.
A few observations can be made on the document. First, some trains are written as "express" and are first-class only. Then, the schedules were based on a 12-hours format, with the indications of "matin" (=morning) and "soir" (=evening) to know if it is am or pm. However, these indications were not always clear and it can get quite confusing to know whether the train leaves in the morning or in the evening. Furthermore, some small incoherences were observed in the timetables, for example the distance is sometimes different if one goes from A to B or from B to A. Same goes with the prices. But these differences are usually not more than a couple of kilometers or a ten of cent.
Milestones
This section summarized the different milestones achieved and the corresponding dates. It was necessary to organize this project as a series of milestones with corresponding dates, first to ensure that it was feasible, and then to encourage the group to work on a weekly basis even though the project itself was only due at the very end of the semester. The first obvious thing that had to be done with such a document was understanding how it works, i.e. understand what informations it contains and how they are organized. The document was printed in an A3-format to facilitate its study and the manual data extraction. Then the main focus switched to the coding part of the project. Some basic knowledges in web languages (HTML, Javascript, JSON, CSS, ...) were essential for this project and one had thus to get first familiar with these languages. Then, some Javascript library were identified as potentially helpful (JQuery) and the data were chosen to be entered in the computer as JSON objects. As this is a quite tedious job, It was decided to make a first test of the program with only one possible route, and then, only if it turned out to work well, to enter more data to extend the program. The feasibility of converting the data into GTFS (General Transit Feed Specification) format was evaluated, but it was concluded that we did not have the appropriate skills to do so. The aesthetic aspect of the interface and the completion of the wiki article were dealt at the end of the project, when the program was fully working. All these tasks were evenly splitted in order for each member of the group to spend the same amount of time on the project.
The following list summarizes how the project was planned:
- 12.10.2018: Formation of group, choice of map.
- 17.10.2018: Decision about what to do with this map. Resulted in a CFF-like website.
- 19.10.2018: Presentation of the idea to the class.
- 24.10.2018: Understanding of the document, A3-model printed.
- 31.10.2018: Getting familiar with web languages. Basic HTML program. Choice of system to store the data (JSON file). Creation of the graph.
- 07.11.2018: Preparation of the slides for the midterm evaluation.
- 09.11.2018: Mid-term presentation.
- 14.11.2018: (Unsuccessful) attempt of using GTFS format
- 21.11.2018: Working program with one route only and departure only (Paris - Bourg). Creation of the website, with different tabs (Search, Document, About). First consideration of aesthetic aspect (CSS file).
- 28.11.2018: Add arrival option. Now search program is fully operational. Half of the data computed in the JSON file.
- 05.12.2018: Full set of data.
- 12.12.2018: Aesthetic aspect (CSS file). Website finished
- 14.12.2018: Completion of the wiki article.
Extraction of the data
The best way to work with this kind of data is to create a graph. So one needed the connections between each node of this graph (i.e. the cities). The format chosen to store the data was JSON. GTFS (General Transit Feed Specification) had been considered and would have been more relevant, since it is the classic format used for public transportation around the world and would have thus made this project more easily reproducible and exportable. Unfortunately, it looked like the only way to use this format was to set up a server to host node.js and after several unsuccessful attempts, it was concluded that we lacked the required skills and it would have been too time-consuming to spend more time on it.
JSON files are extremely handy to store this kind of data. An object has been created for each city, containing all the cities linked to it, the schedules of the trains going there (including a boolean express to know whether the train is first-class only), the price of the ticket for each class and the distance between them. The example shown here is for Paris. It is only connected to Mâcon and there are two trains per day, one being a first-class only.
This process was quite tedious. For practical reasons, in the French part of what we defined as the detailed zone, only the cities in bold characters have been kept. Nevertheless, all the destinations in Switzerland have been kept, because they were the most relevant to us and the CFF-like website . Even with this restriction, the timetable that was stored in the JSON file was still consequent and reflects well how the train network looked like in 1858. Overall, the schedules for more than thirty cities have been extracted. The JSON file is easily extensible if others schedules were to be added to the cities or if the path were extended. If cities were to be added in between two existing cities, it is feasible but slightly less trivial as changes should be made in neighboring cities as well.
As any data extraction, manual or not, a quantitative evaluation of the extraction performance must be done. The extraction being manual, some inadvertent mistakes may exist. A set of XXX random schedules have been chosen to check statistically our extraction. Over these XXX schedules, we found XXX mistakes for the schedules, XXX for the prices and XXX for the distance. This leads us to evaluate our extraction process to be XXX% accurate. Furthermore, as mentioned above, the morning/evening distinctions are very confusing sometimes and lead to bigger incertitudes on the accuracy of our data.
The Website
The website has been designed to look like CFF website. It features three different tabs (Search, Original document and About). Original Document and About contain informations that are available as well in this wiki. The Search tab is the main one and the home screen when one goes on the website. It allows the user to find an itinerary between two cities. The user must type the names of these two cities, and can give a time at which he wants to depart or arrive. If no time is entered, the current time is taken as the default value.
A few additional clarification about the inputs must be stated. There is no information on special days or week-ends in this document, making thus the timetable day-independent. This is why the date is not a possible input in the website, as the schedules will not change accordingly. The names of the cities are proposed to the user while he is looking for one. Capital letters or not do not matter, however misspelling as well as accent do matter. It will not work if "Mâcon" is written "Macon".
From these parameters, the user is given the best itinerary, that is to say the one that takes the least of time to go from A to B. If the result is different for first-class and second/third-class, the user is given both options. The itinerary contains the departure time, the expected arrival (if the 1858 trains were on time), the total journey time, the number of changes, the classes available for the train with the corresponding prices and the distances. By clicking on the itinerary, more details are given about the schedule, like all the cities gone through and the times for the changes.
A few words about the code now. The data from the JSON file have stored in a graph. Each node of the graph is one city-object. We used then the breadth-first search (BFS) algorithm to traverse and search into our data. It starts at an arbitrary node of the graph (either the departure or arrival input) and explores all of the neighbor nodes at that depth before moving on to the nodes located at the next level. Each no
Comparison with modern trains
- comparison of actual networks
- comparison of travel times, including examples (screenshot of both 1858 and 2018)
- ...
Further possible development
- assess impact of train in cities development
- create a game with trains