Paris 1909 TripAdvisor: Difference between revisions
(17 intermediate revisions by 2 users not shown) | |||
Line 26: | Line 26: | ||
*Searching for missing data and additional data (e.g. posters and photos) | *Searching for missing data and additional data (e.g. posters and photos) | ||
*Scheduling and drawing up the project plan | *Scheduling and drawing up the project plan | ||
*Link places with Wikidata pages | |||
===Milestone 3: Design and Data Processing (14 Nov) === | ===Milestone 3: Design and Data Processing (14 Nov) === | ||
Line 94: | Line 95: | ||
=== The database === | === The database === | ||
The database is organized around the places marked on the map. Each place has its ''map id'' (the number assigned to it on the map), an ''address'', ''geographical coordinates'', and is assigned to a particular ''category'' (theatre, ball, cabaret, etc.). It also '''may''' have a ''detailed description'', a ''phone number'', ''prices'', ''working hours'', ''menu'', | The database is organized around the places marked on the map. Each place has its ''map id'' (the number assigned to it on the map), an ''address'', ''geographical coordinates'', the respective "arrondissement" and is assigned to a particular ''category'' (theatre, ball, cabaret, etc.). It also '''may''' have a ''detailed description'', a ''phone number'', ''prices'', ''working hours'', ''menu'', and the ''nearest metro station'', a ''photograph'' and a ''poster'', the ''Wikidata id''. | ||
The ''costs'' and ''working hours'' were particularly difficult to formalize as they vary greatly. For instance, balls often have different entry prices for men and women, in theatres prices depend on a chosen type of place, cabarets' prices often include a drink, etc. Similarly, the working hours depend on days of week, holidays, seasons. In creating the database we were trying to keep as much as possible this complex structure of the data. | The ''costs'' and ''working hours'' were particularly difficult to formalize as they vary greatly. For instance, balls often have different entry prices for men and women, in theatres prices depend on a chosen type of place, cabarets' prices often include a drink, etc. Similarly, the working hours depend on days of week, holidays, seasons. In creating the database we were trying to keep as much as possible this complex structure of the data. | ||
Line 103: | Line 104: | ||
As in the extraction of data, multiple procedures were carried out, there is a possible source for mistakes for each of those stages. For each one, a quality assessment is done, going through the identification of possible errors, sampling, verification, and quantification of how accurate the extraction was. At the end of this section, a ponderate accuracy is presented. | As in the extraction of data, multiple procedures were carried out, there is a possible source for mistakes for each of those stages. For each one, a quality assessment is done, going through the identification of possible errors, sampling, verification, and quantification of how accurate the extraction was. At the end of this section, a ponderate accuracy is presented. | ||
In regards to the text extraction the possible mistakes may | In regards to the '''text extraction''' the possible mistakes may include : | ||
*Indicate that a place was missing in the English guide when it was or make an incomplete or incorrect extraction of the information in the guide (in terms of description or address). | |||
*Indicate that a place was missing in the English guide when it was or make an incomplete extraction of the information in the guide. | |||
*Not notice bad-translated words in the English guide, especially coming by the OCR precision, it includes also the omission of accents in person/ places names. | *Not notice bad-translated words in the English guide, especially coming by the OCR precision, it includes also the omission of accents in person/ places names. | ||
* | *Stylistic inconsistencies or semantic inaccuracies, as result of the 31% of places' annotations that required manual translation from French. | ||
In the | In the '''data augmentation''' section, 52.7% of the places have an associated Wikidata page. For them, the identified possible errors are: | ||
*The WikidataID of a place is wrong (e.g typos) | *The WikidataID of a place is wrong (e.g typos) | ||
*The WikidataID of a place seems to point out the right entry, however, the Wikidata page is detailed studied (and its associated links), the place that is really referenced is another one (e.g. the problem of places with the same name in different time periods) | *The WikidataID of a place seems to point out the right entry, however, the Wikidata page is detailed studied (and its associated links), the place that is really referenced is another one (e.g. the problem of places with the same name in different time periods) | ||
In the | In the '''georeferencing''' section, the identified possible errors are: | ||
*The coordinates that were taken from an external source are wrong (redirects to a different route of the one indicated in the map) | *The coordinates that were taken from an external source are wrong (redirects to a different route of the one indicated in the map) | ||
*The coordinates that were taken obtained with the Georeferencer tool in QGIS are not in the same route of the one indicated in the map | *The coordinates that were taken obtained with the Georeferencer tool in QGIS are not in the same route of the one indicated in the map | ||
*The arrondissement associated to a place is not correct | *The arrondissement associated to a place is not correct | ||
In the | In the '''database''' section, the identified possible errors are: | ||
*The images linked with a place, do not correspond to it | *The images linked with a place, do not correspond to it | ||
The sampling process in the quality assessment consisted in taking randomly 10% of the places for each one of the types of errors. | The accuracy measure is made for most of the identified process susceptibles to error, skipping just the dimension of style for its quantification difficulty. The sampling process in the quality assessment consisted in taking randomly 10% of the places for each one of the types of errors. If the information of a place does not have the specific error, then it sums up one point. With the summed up points of all the places studied with respect to such error, an accuracy score between 0%-100% is given for the related process (where 100% is the most desirable score). | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 132: | Line 128: | ||
! Accuracy of process | ! Accuracy of process | ||
|- | |- | ||
| Incomplete information | | Incomplete or incorrect information | ||
| | | 81.8% | ||
|- | |- | ||
| Bad-translated words or omission of accents | | Bad-translated words or omission of accents | ||
Line 154: | Line 150: | ||
|} | |} | ||
From there is taken that the global accuracy of the information's extraction is | From there is taken that the global accuracy of the information's extraction is 87%. | ||
== Services provided == | == Services provided == | ||
Line 170: | Line 166: | ||
Finally, the ''Experiences'' section invites a user to immerse him- or herself into the daily routine of the Parisian public in 1909. ''Parisian uses'' are sorted here according to the time of day and further described in detail. For instance, when picking a dancing tea-time (between 4 p.m. and 7 p.m.), one gets the whole description of the tea-dancing practice as well as the best places where it takes place. In providing such a service, we are seeking to extend the "spatial" exploration of the city and to augment the map with vivid details and descriptions, which can be found in the Guide. | Finally, the ''Experiences'' section invites a user to immerse him- or herself into the daily routine of the Parisian public in 1909. ''Parisian uses'' are sorted here according to the time of day and further described in detail. For instance, when picking a dancing tea-time (between 4 p.m. and 7 p.m.), one gets the whole description of the tea-dancing practice as well as the best places where it takes place. In providing such a service, we are seeking to extend the "spatial" exploration of the city and to augment the map with vivid details and descriptions, which can be found in the Guide. | ||
== | == Application development == | ||
The final version of the application is on: | |||
http://174.138.13.133:1909 | |||
A web application is developed by Python and Django. | |||
A relational database was used for storing data. PostgreSQL is employed for implementation. | |||
The code is available on the following link: | |||
https://github.com/MrymZakani/Paris1909TripAdvisor | https://github.com/MrymZakani/Paris1909TripAdvisor | ||
== Team == | == Team == | ||
- Alina, Maryam, and Paola | - Alina, Maryam, and Paola |
Latest revision as of 09:21, 19 December 2018
The project intends to recreate the cultural geography of Paris in the Belle Époque, immersing a user into the world of cabarets, balls, theatres, the universe described by Zola and Proust, and painted by Renoir and Toulouse-Lautrec. In order to give a new perspective on how this legendary world was actually structured and perceived, we are going to digitize the authentic Plan des plaisirs et attractions de Paris created in 1909 and augment it with the pieces of evidence and descriptions from the Guide des plaisirs à Paris: timetables, advice and guidance on what to wear, what to say, and where to go.
Historical introduction
Plan des plaisirs et attractions de Paris traces the cultural geography of Paris at the end of the 1900s mapping the hand-picked theatres, restaurants, concerts, public balls and other amusements of the period. This sketch map was drawn by the bureau Le Natur to illustrate the Pleasure Guide to Paris (Guide des plaisirs à Paris) published in 1909. The Guide was reissued on multiple occasions (according to the catalog of the BNF, it had at least twelve editions appeared between 1899 and 1931) and was translated into several languages (notably, there were at least two English editions brought out in 1903 and 1925).
The map and the Guide constitute the historical grounds of the project, providing both a complex, richly detailed narrative and a spatial view of Paris in the Belle Époque. Issued at the dawn of the era of global tourism, those documents reflect the image of the city as it was created for the glance of a (primarily, male) foreigner of that time. Notably, one will hardly find any historical sights on the map and typical accounts of them in the Guide. Instead, the Guide proposes a kind of anthropological introduction into the Parisian nightlife combining everyday recommendations on how to dress up and what to say with the descriptions of such "curious" and "piquant" places as working-class balls, small artistic inns, exotic cabarets of Montmartre, amusements of "the worst parts of Paris" ("Dessous de Paris"), etc. Interestingly enough, the document reflects the spirit of the time when "marginal" artistic communities, as well as "exotic" culture of "lower classes," began to be recognized as a source of pleasure and desire.
At the turn of the century, similar guidebooks representing early forms of mass culture were written for other European capitals [1], which may be considered for the potential extension of the Paris 1909 TripAdvisor project.
Project Plan and Milestones
Milestone 1: Concept (up to 4 Nov)
- First brainstorming and first sketch of ideas
- Exploring the map and the sources
- Developing the overall concept of the project
Milestone 2: Data extraction (9 Nov)
- Selecting the types of data to be extracted
- 20% of the data extracted and annotated
- Verifying the data quality
- Searching for missing data and additional data (e.g. posters and photos)
- Scheduling and drawing up the project plan
- Link places with Wikidata pages
Milestone 3: Design and Data Processing (14 Nov)
- Designing the prototype on paper (wireframes)
- Selecting the features of the application
- Searching for the best technical way to show clickable points corresponding to the places on the map
- Preparing the midterm presentation
Milestone 4: Database implementation (28 Nov)
- Developing a web app for entering the data:
- Initialization of Django application
- Design and implementation of the database in SQL
- Deploying the first version of the web application: the website will be deployed on a public URL and team members will have access to input extracted information to the database.
- Extracting the rest of the data
- Adding new data to the app database
- Making the improvement of the map
Milestone 5: Development of the platform (14 Dec)
- Designing the entry story
- Implementing the services of the app:
- Displaying the map and the project description
- Implementing the UI to explore the map and the places
- Adding search and filter functionalities
- Displaying the annotation of each place
- Displaying additional data and tutorial pop-ups
- Adding the recommendations page
- Refactoring the github repository to submit
Milestone 6: Presenting the Results (19 Dec)
- Preparing the final presentation
- Writing the report
- Deploying the final version of the web application for public usage
Extraction methods
Text extraction
A large part of the work consisted in dealing with the text extraction. We had at our disposal two digitized versions of the Guide: the French edition of 1909 digitized by Gallica and the English edition of 1925 found on the web. The French edition is an authentic one, the one for which the map was intended. However, the quality of its digitization is rather bad (66% according to Gallica): large pieces of text are non-recognized, hence requiring to be retyped by hand. As all members of the team are not francophones, this might result not only in many hours of work but also in a poor quality of texts. The English digital version is of a better quality, but as it was brought out 16 years after the French one, it lacks some pieces of information and reflects the Parisian cultural life of the mid-1920s. After long hesitation, we decided to base on the English edition, having taken into account that the English language would have made the application accessible for many more users.
The data were extracted manually (retyping and copying when possible). All the precise pieces of data: prices, working hours, phone numbers, menus – were still derived from the French edition of the Guide in order to keep the authenticity of the created image of the past. All the English descriptions were compared with the French ones in order to bring to light historical inconsistencies and eliminate them.
The overall protocol consisted in:
- preliminary observation of the data in both French and English editions of the Guide;
- comparing the two editions, indicating the data that are missing in the English version or that have to be corrected because of the historical distance between the two editions;
- extracting 20% of the data;
- deciding on the types of the data to be extracted and creating the database;
- filling the gaps of the English version: translating the pieces of information from the French guide;
- filling the database using both French and English editions.
Data Augmentation
In order to extend the information of the places found in the guide, we retrieved when possible the WikidataID associated for the place. The overall protocol for each place consisted in:
- search in Wikidata for a page with the name of the place;
- if it appeared with a valid name and the information on the page was identic to the one consigned in the guide (the address was the same, or the page contained other links (e.g. to the Carthalia or the BNF archive), that presented the same address of the place), the code was retrieved;
- if it appeared with the valid name, but the different address, it was checked in the map if the address presented in the guide and the one in the Wikipedia page were close enough, and given the case, the code was retrieved;
- if it appeared with a valid name, but the different address (usually noticed by the disambiguation in Commons Category), the code was not taken into account;
- if it did not appear with a valid name, the content of the page was analyzed (and its linked Wikipedia page) in order to find alternative names for the place. If it was the case, the code was retrieved.
Georeferencing
For finding the coordinates associated with the places, the following protocol was made:
- If the place had an associated Wikidata ID, and the Wikidata page had the coordinates, they were retrieved (usually imported for Wikimedia project);
- If that was not the case, but the place has an associated address, the address is searched in https://www.latlong.net/convert-address-to-lat-long.html and from there the coordinates are taken;
- If the place has not an address, the coordinates are interpolated. With this purpose:
- The image in high resolution of the map is downloaded from the BNF page
- The image is improved (removing parts of the image, corresponding to folds)
- The image is uploaded to the QGIS software by the Georeferencer tool. Twenty-five points are mapped from the map to actual coordinates. The points selected were usually key points in the limits of the quartiers. Then, by means of the transformation type "Thin Plate Spline" and the resampling method “Cubic Spline", the mapping is made.
- The coordinates associated with the place on the map are retrieved.
Once the dataset has coordinates for all the places, it is downloaded in CSV format and read in QGIS. With this dataset and a vector layer of the arrondissements in Paris, a join of attributes by location is done, so that, each point has associated an arrondissement.
The database
The database is organized around the places marked on the map. Each place has its map id (the number assigned to it on the map), an address, geographical coordinates, the respective "arrondissement" and is assigned to a particular category (theatre, ball, cabaret, etc.). It also may have a detailed description, a phone number, prices, working hours, menu, and the nearest metro station, a photograph and a poster, the Wikidata id.
The costs and working hours were particularly difficult to formalize as they vary greatly. For instance, balls often have different entry prices for men and women, in theatres prices depend on a chosen type of place, cabarets' prices often include a drink, etc. Similarly, the working hours depend on days of week, holidays, seasons. In creating the database we were trying to keep as much as possible this complex structure of the data. We also intended to augment the Guide data by adding the authentic photographs of a place (taken around 1909) and its posters, playbills, placards, which were deeply characteristic of the period. The annotated images were searched on the web and assigned to the places in the database (in the end, only 10% of places have no original image).
Quality Assessment of the performances of extraction
As in the extraction of data, multiple procedures were carried out, there is a possible source for mistakes for each of those stages. For each one, a quality assessment is done, going through the identification of possible errors, sampling, verification, and quantification of how accurate the extraction was. At the end of this section, a ponderate accuracy is presented.
In regards to the text extraction the possible mistakes may include :
- Indicate that a place was missing in the English guide when it was or make an incomplete or incorrect extraction of the information in the guide (in terms of description or address).
- Not notice bad-translated words in the English guide, especially coming by the OCR precision, it includes also the omission of accents in person/ places names.
- Stylistic inconsistencies or semantic inaccuracies, as result of the 31% of places' annotations that required manual translation from French.
In the data augmentation section, 52.7% of the places have an associated Wikidata page. For them, the identified possible errors are:
- The WikidataID of a place is wrong (e.g typos)
- The WikidataID of a place seems to point out the right entry, however, the Wikidata page is detailed studied (and its associated links), the place that is really referenced is another one (e.g. the problem of places with the same name in different time periods)
In the georeferencing section, the identified possible errors are:
- The coordinates that were taken from an external source are wrong (redirects to a different route of the one indicated in the map)
- The coordinates that were taken obtained with the Georeferencer tool in QGIS are not in the same route of the one indicated in the map
- The arrondissement associated to a place is not correct
In the database section, the identified possible errors are:
- The images linked with a place, do not correspond to it
The accuracy measure is made for most of the identified process susceptibles to error, skipping just the dimension of style for its quantification difficulty. The sampling process in the quality assessment consisted in taking randomly 10% of the places for each one of the types of errors. If the information of a place does not have the specific error, then it sums up one point. With the summed up points of all the places studied with respect to such error, an accuracy score between 0%-100% is given for the related process (where 100% is the most desirable score).
Possible error | Accuracy of process |
---|---|
Incomplete or incorrect information | 81.8% |
Bad-translated words or omission of accents | 72.7% |
The WikidataID of a place is wrong | 90.9% |
The WikidataID points to a different place (even though similar name and description) | 100% |
The coordinates of a place are in different route | 81.8% |
The arrondissement associated to a place is not correct | 81.8% |
The image linked to a place is not correct | 100% |
From there is taken that the global accuracy of the information's extraction is 87%.
Services provided
The project follows the logic of modern touristic services such as TripAdvisor¨, inviting a user to orient him- or herself in Paris in the Belle Époque. By transferring the historical narrative of the map and the guide into the digital environment, we intend to make it accessible and comprehensible for a very general public.
The application contains three basic entry points: "Places", "Neighbourhoods", and "Experiences", each offering a distinctive perspective on the Parisian cultural life.
Places
The project is built around the places of interest marked on the map and described in the Guide. A user starts with observing the distribution of places on the map. She may sort the places according to its type (theatres, cabarets, balls, restaurants, etc.) or search for a particular place. All the places have a brief annotation allowing not to get distracted from observing the whole picture, and an extensive annotation comprising a detailed description, images and other data associated with the place. Thus, starting from the general overview one might dig into the further exploration of particular places.
Neighborhoods
Another option is to "take a stroll" around a particular quarter following the path proposed by the Guide. One might choose between Montmartre, Quartier Latin, Grand Boulevards, Rue Gaité or La Tournée des Grand Ducs, and get rendered the selection of places as well as an overall description of the chosen neighborhood and its life in 1909. This strategy of the map exploration allows for concentrating on the particular district of the city – say, Montmartre, which became an emblem of the Parisian Belle Époque.
Experiences
Finally, the Experiences section invites a user to immerse him- or herself into the daily routine of the Parisian public in 1909. Parisian uses are sorted here according to the time of day and further described in detail. For instance, when picking a dancing tea-time (between 4 p.m. and 7 p.m.), one gets the whole description of the tea-dancing practice as well as the best places where it takes place. In providing such a service, we are seeking to extend the "spatial" exploration of the city and to augment the map with vivid details and descriptions, which can be found in the Guide.
Application development
The final version of the application is on:
A web application is developed by Python and Django.
A relational database was used for storing data. PostgreSQL is employed for implementation.
The code is available on the following link:
https://github.com/MrymZakani/Paris1909TripAdvisor
Team
- Alina, Maryam, and Paola
- ↑ See, for example, Little, Charles. London Pleasure Guide. London: Simpkin, Marshall, Hamilton, Kent, 1898.