Train schedules

From FDHwiki
Revision as of 10:17, 12 December 2018 by Odietric (talk | contribs)
Jump to navigation Jump to search

Introduction

This project aims to observe how a railway user of the middle of the 19th century could travel between some cities such as Paris, Lyon, Geneva or Marseille in regards to nowadays. Several aspects of the trip have been looked into, the journey path, the schedules of the trains and the price of the tickets. All of these informations come from a single document that contains both a timetable and a map of the railway network and have been computed into a website, so one can look for a trip and find easily all the informations needed. This whole project puts into perspective how the evolution of transports, in this particular example trains, have made the world much more connected and travelling much easier.

Recto of the document
Verso of the document
Zoom on the map
Recto of the document
Verso of the document
Zoom on map

The document

zoom on detailed path

This project will focus on this old train schedules from 1858. It can be found in the Bibliothèque Nationale de France's (BNF) digital library, Gallica. The document belongs to BNF's department of maps and is free to access, as all the document presented in Gallica.

The map was created by Alfred Potiquet (1820-1883), a civil engineer known for being responsible of the first stamp catalogue in the world, in 1861. The full title is Chemin de fer de Lyon à Genève. Marche des trains depuis le 10 novembre 1858. Correspondances avec les chemins de fer de France, de Suisse & d'Italie. As the title implies, it contains the complete timetable for trains between Lyon and Geneva, but also the connections between Swiss, French and Italian trains.

The document can be divided into two parts, the first part is the timetable with all the information about stops, departures, prices, distances and arrivals for each route. For each of the routes, one can read the prices for each class (first, second and third), the distance between the destination’s and the different departure and arrival times. The prices and distances are written in regard to the first city of the route. The other part is the map over the entire system where all the stops are marked. The bigger cities where one could change trains is marked with a bigger font, as seen in the picture zoom on detailed path

A few observations have been made while working with the document. Firstly, some trains are written as "express" and are first-class trains only. These are normally faster and creates better connections with other trains. Secondly, the schedules were based on a 12-hours format, with the indications of "matin" (=morning) and "soir" (=evening) to know if it is am or pm. However, these indications are not always clear, and it was a bit confusing to know whether the train left in the morning or in the evening. Furthermore, some small incoherences were observed in the timetables, for example the distance is sometimes different if one goes from A to B or from B to A and same goes for the prices. However, these differences are usually not more than a couple of kilometres or a couple of cents so during our extracting phase they were rationalized.

Milestones

This section summarized the different milestones achieved and the corresponding dates. It was necessary to organize this project as a series of milestones with corresponding dates, first to ensure that it was feasible, and then to encourage the group to work on a weekly basis even though the project itself was only due at the very end of the semester. The first obvious thing that had to be done with such a document was understanding how it works, i.e. understand what informations it contains and how they are organized. The document was printed in an A3-format to facilitate its study and the manual data extraction. Then the main focus switched to the coding part of the project. Some basic knowledges in web languages (HTML, Javascript, JSON, CSS, ...) were essential for this project and one had thus to get first familiar with these languages. Then, some Javascript library were identified as potentially helpful (JQuery) and the data were chosen to be entered in the computer as JSON objects. As this is a quite tedious job, It was decided to make a first test of the program with only one possible route, and then, only if it turned out to work well, to enter more data to extend the program. The feasibility of converting the data into GTFS (General Transit Feed Specification) format was evaluated, but it was concluded that we did not have the appropriate skills to do so. The aesthetic aspect of the interface and the completion of the wiki article were dealt at the end of the project, when the program was fully working. All these tasks were evenly splitted in order for each member of the group to spend the same amount of time on the project.

The following list summarizes how the project was planned:

  • 12.10.2018: Formation of group, choice of map.
  • 17.10.2018: Decision about what to do with this map. Resulted in a CFF-like website.
  • 19.10.2018: Presentation of the idea to the class.
  • 24.10.2018: Understanding of the document, A3-model printed.
  • 31.10.2018: Getting familiar with web languages. Basic HTML program. Choice of system to store the data (JSON file). Creation of the graph.
  • 07.11.2018: Preparation of the slides for the midterm evaluation.
  • 09.11.2018: Mid-term presentation.
  • 14.11.2018: (Unsuccessful) attempt of using GTFS format
  • 21.11.2018: Working program with one route only and departure only (Paris - Bourg). Creation of the website, with different tabs (Search, Document, About). First consideration of aesthetic aspect (CSS file).
  • 28.11.2018: Add arrival option. Now search program is fully operational. Half of the data computed in the JSON file.
  • 05.12.2018: Full set of data.
  • 12.12.2018: Aesthetic aspect (CSS file). Website finished
  • 14.12.2018: Completion of the wiki article.

Extraction of the data

Data are stored as objects in the JSON file
Early in the project it was decided that the extraction process should be done manually. Partly because it was easier to get started that way, but foremost since the document doesn’t contain that mush data. The process of setting up extracting program would take more time than to do all extraction by hand. Since the data already was written in tables it was quite straight forward to extract the information needed. Moreover, with the morning/evening schedules not always clear and with some data requiring basic maths calculation during the extraction, it simply seemed faster to do it manually.

The chosen representation of the data is a graph structure where each city is a node and all connections are represented with an edge. Even though it is possible both to go from node A to B and from B to A, the data was structured as a directed graph. This was done because each direction has features in terms of departures, travel times and costs.

The format chosen to store the data was JSON. GTFS (General Transit Feed Specification) was also considered since it is a commonly used format and using it would have made the data reusable for others. Unfortunately, GTFS is built on a structure with predefined zones for the routes. Depending on the zones the price for the trip is calculated. This was not the case for our data since each route had its own cost and the total trip cost was calculated by adding all sub-costs.

Using JSON, an object was created for each city, containing all the cities linked to it, the distance between them, the schedules of the trains going there (including a boolean express to know whether the train is first-class only) and the price of the ticket for each class. The example shown is for Paris. Paris is only connected to Mâcon and there are two trains per day, one being a first-class only train.

In the end, the extraction process was quite tedious, and the original timetable was a bit incoherent at times. For example, sometimes different departures had different costs even though the route was the same. Therefor some simplifications had to be made in order to not create to many edge cases. For practical reasons, in the French part of what we defined as the detailed zone, only the cities in bold characters have been kept. These were the cities where one had to change trains to continue travel. Nevertheless, all the destinations in Switzerland have been kept, because they were the most relevant to us and the CFF-like website. Even with this restriction, the timetable that was stored in the JSON file was still consequent and reflects well how the train network looked like in 1858. Overall, the schedules for more than thirty cities have been extracted. The JSON file is easily extensible if other schedules were to be added to the cities or if the path were extended. If cities were to be added in between two existing cities, it is feasible but slightly less trivial as changes should be made in neighbouring cities as well.

Quantitive analysis of extraction

As any data extraction, manual or not, a quantitative evaluation of the extraction performance must be done. In this project, since the extraction process was made manually, some inadvertent mistakes may exist. A set of XXX random schedules have been chosen to check statistically our extraction. Over these XXX schedules, we found XXX mistakes for the schedules, XXX for the prices and XXX for the distance. This leads us to evaluate our extraction process to be XXX% accurate. Furthermore, as mentioned above, the morning/evening distinctions are very confusing sometimes and lead to bigger uncertainties on the accuracy of our data.

The Website

The website has been designed to look like CFF website. It features three different tabs (Search, Original document and About). Original Document and About contain informations that are available as well in this wiki. The Search tab is the main one and the home screen when one goes on the website. It allows the user to find an itinerary between two cities. The user must type the names of these two cities, and can give a time at which he wants to depart or arrive. If no time is entered, the current time is taken as the default value.

A few additional clarification about the inputs must be stated. There is no information on special days or week-ends in this document, making thus the timetable day-independent. This is why the date is not a possible input in the website, as the schedules will not change accordingly. The names of the cities are proposed to the user while he is looking for one. Capital letters or not do not matter, however misspelling as well as accent do matter. It will not work if "Mâcon" is written "Macon".

From these parameters, the user is given the best itinerary, that is to say the one that takes the least of time to go from A to B. If the result is different for first-class and second/third-class, the user is given both options. The itinerary contains the departure time, the expected arrival (if the 1858 trains were on time), the total journey time, the number of changes, the classes available for the train with the corresponding prices and the distances. By clicking on the itinerary, more details are given about the schedule, like all the cities gone through and the times for the changes.

A few words about the code now. The data from the JSON file are stored in a graph. Each node of the graph is one city-object, as explained in the previous section. Each node contains the list of the other nodes linked to it, and the edges in between contain the corresponding schedules, prices and distances. We used then the breadth-first search (BFS) algorithm to traverse and search into our graph. It starts at an arbitrary node of the graph (either the departure or arrival input) and explores all of the neighbor nodes at that depth before moving on to the nodes located at the next level. When the path from the departure city to the arrival one is found (nodes 1 to N), we look for the fastest itinerary in regards to the schedules. If the user asks for a departure time, the program finds the first train that leaves node 1 for node 2 that is after the input time. Then, at node 2, it looks for the first train to go to node 3 that leaves after the arrival time from node 1. And so on and so forth until the node N is reached. Every time midnight is crossed, the date is updated in order to know how many days have passed. The prices and the distances are the sum of the price and the distance of each edges contained in the graph.

Comparison with modern trains

It is interesting to see how travels in trains have changed and evolved from 1858 to nowadays.

  • Railway Network: One can notice that some paths from 160 years ago are still used today. For exemple, the trains between Geneva and Mâcon take the exact same path than the trains in 1858, going through Culoz and Ambérieu. Of course the railways have been modernized and the number of stops has been reduced for express trains but the journey is the same. On the other hands, some have been added, like the connection between Lausanne and Salins, that was a major gain of time to link Lausanne with Paris.
  • Prices: The prices given on the document were on French Francs. The average salary for a worker at that time was between 2 and 3 fr. a day.[1]

Further possible development

  • assess impact of train in cities development
  • create a game with trains

Références

Template:Références