Paris 1909 TripAdvisor: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 100: Line 100:
== Quality Assessment of the performances of extraction ==
== Quality Assessment of the performances of extraction ==


As in the extraction of information multiple processes were carried out, there is a possible source for each of those stages. In the text extraction section
As in the extraction of information multiple processes were carried out, there is a possible source for each of those stages. For each one, a quality assessment is done, going through the identification of possible errors, sampling, verification, and quantification of how accurate the extraction was. At the end of this section, a ponderate accuracy is presented.
-Random taking of entries, and verify if it was done correctly
 
-extracting
In the "text extraction" section, the identified possible errors are:
-searching on Wikidata
- Indicate that a place was missing in the English guide when it was
- Not notice bad-translated words in the English guide, especially coming by the OCR precision
- Inadequate categorization of places
 
In the "data augmentation" section, the identified possible errors are:
 
- The WikidataID of a place is wrong
- The WikidataID of a place seems to point out the right entry, however, when the associated links are studied, the place that is referenced is another one (e.g. the problem of places with the same name in different time periods)


== Services provided ==
== Services provided ==

Revision as of 09:15, 14 December 2018

Plan des plaisirs et attractions de Paris(1909)
Guide des plaisirs à Paris

The project intends to recreate the cultural geography of Paris in the Belle Époque, immersing a user into the world of cabarets, balls, theatres, the universe described by Zola and Proust, and painted by Renoir and Toulouse-Lautrec. In order to give a new perspective on how this legendary world was actually structured and perceived, we are going to digitize the authentic Plan des plaisirs et attractions de Paris created in 1909 and augment it with the pieces of evidence and descriptions from the Guide des plaisirs à Paris: timetables, advice and guidance on what to wear, what to say, and where to go.


Historical introduction

Plan des plaisirs et attractions de Paris traces the cultural geography of Paris at the end of the 1900s mapping the hand-picked theatres, restaurants, concerts, public balls and other amusements of the period. This sketch map was drawn by the bureau Le Natur to illustrate the Pleasure Guide to Paris (Guide des plaisirs à Paris) published in 1909. The Guide was reissued on multiple occasions (according to the catalog of the BNF, it had at least twelve editions appeared between 1899 and 1931) and was translated into several languages (notably, there were at least two English editions brought out in 1903 and 1925).

The map and the Guide constitute the historical grounds of the project, providing both a complex, richly detailed narrative and a spatial view of Paris in the Belle Époque. Issued at the dawn of the era of global tourism, those documents reflect the image of the city as it was created for the glance of a (primarily, male) foreigner of that time. Notably, one will hardly find any historical sights on the map and typical accounts of them in the Guide. Instead, the Guide proposes a kind of anthropological introduction into the Parisian nightlife combining everyday recommendations on how to dress up and what to say with the descriptions of such "curious" and "piquant" places as working-class balls, small artistic inns, exotic cabarets of Montmartre, amusements of "the worst parts of Paris" ("Dessous de Paris"), etc. Interestingly enough, the document reflects the spirit of the time when "marginal" artistic communities, as well as "exotic" culture of "lower classes," began to be recognized as a source of pleasure and desire.

At the turn of the century, similar guidebooks representing early forms of mass culture were written for other European capitals [1], which may be considered for the potential extension of the Paris 1909 TripAdvisor project.

Project Plan and Milestones

Milestone 1: Concept (up to 4 Nov)

  • First brainstorming and first sketch of ideas
  • Exploring the map and the sources
  • Developing the overall concept of the project

Milestone 2: Data extraction (9 Nov)

  • Selecting the types of data to be extracted
  • 20% of the data extracted and annotated
  • Verifying the data quality
  • Searching for missing data and additional data (e.g. posters and photos)
  • Scheduling and drawing up the project plan

Milestone 3: Design and Data Processing (14 Nov)

  • Designing the prototype on paper (wireframes)
  • Selecting the features of the application
  • Searching for the best technical way to show clickable points corresponding to the places on the map
  • Preparing the midterm presentation

Milestone 4: Database implementation (28 Nov)

  • Developing a web app for entering the data:
    • Initialization of Django application
    • Design and implementation of the database in SQL
    • Deploying the first version of the web application: the website will be deployed on a public URL and team members will have access to input extracted information to the database.
  • Extracting the rest of the data
  • Adding new data to the app database
  • Making the improvement of the map

Milestone 5: Development of the platform (14 Dec)

  • Designing the entry story
  • Implementing the services of the app:
    • Displaying the map and the project description
    • Implementing the UI to explore the map and the places
    • Adding search and filter functionalities
    • Displaying the annotation of each place
    • Displaying additional data and tutorial pop-ups
    • Adding the recommendations page
  • Refactoring the github repository to submit

Milestone 6: Presenting the Results (19 Dec)

  • Preparing the final presentation
  • Writing the report
  • Deploying the final version of the web application for public usage

Extraction methods

Text extraction

A large part of the work consisted in dealing with the text extraction. We had at our disposal two digitized versions of the Guide: the French edition of 1909 digitized by Gallica and the English edition of 1925 found on the web. The French edition is an authentic one, the one for which the map was intended. However, the quality of its digitization is rather bad (66% according to Gallica): large pieces of text are non-recognized, hence requiring to be retyped by hand. As all members of the team are not francophones, this might result not only in many hours of work but also in a poor quality of texts. The English digital version is of a better quality, but as it was brought out 16 years after the French one, it lacks some pieces of information and reflects the Parisian cultural life of the mid-1920s. After long hesitation, we decided to base on the English edition, having taken into account that the English language would have made the application accessible for many more users.

The data were extracted manually (retyping and copying when possible). All the precise pieces of data: prices, working hours, phone numbers, menus – were still derived from the French edition of the Guide in order to keep the authenticity of the created image of the past. All the English descriptions were compared with the French ones in order to bring to light historical inconsistencies and eliminate them.

The overall protocol consisted in:

  1. preliminary observation of the data in both French and English editions of the Guide;
  2. comparing the two editions, indicating the data that are missing in the English version or that have to be corrected because of the historical distance between the two editions;
  3. extracting 20% of the data;
  4. deciding on the types of the data to be extracted and creating the database;
  5. filling the gaps of the English version: translating the pieces of information from the French guide;
  6. filling the database using both French and English editions.

Data Augmentation

In order to extend the information of the places found in the guide, we retrieved when possible the WikidataID associated for the place. The overall protocol for each place consisted in:

  1. search in Wikidata for a page with its name
  2. if it appeared with a valid name and the information on the page is the same as the one consigned in the guide (the address in the page is the same, or the page contains other links (e.g. to the Carthalia or the BNF archive), that present the same address of the place), take the code
  3. if it appeared with the valid name but the information on the page presents a different address, it is checked in the map if the address presented in the guide and in the Wikipedia page are closely enough, and given the case, the code is taken
  4. if it appeared with a valid name but the information on the page indicates that the referred place is a specific place located in another place (usually noticed by the disambiguation in Commons Category), the code is not taken into account
  5. if it did not appear with a valid name, the content of the page is analyzed (and its linked Wikipedia page), if it's found to be an alternative name, but the same place, the code is taken

Georeferencing

For finding the coordinates associated with the places, the following protocol was made:

  1. If the place had an associated Wikidata ID, and the Wikidata page had the coordinates, take them (usually imported for Wikimedia project).
  2. If that was not the case, but the place has an associated address, the address is searched in https://www.latlong.net/convert-address-to-lat-long.html and from there the coordinates are taken
  3. If the place has not an associated address, the coordinates are interpolated. With this purpose:
    1. The image in high resolution of the map is downloaded from the BNF page
    2. The image is improved (remove parts of the image, corresponding to folds)
    3. The image is uploaded to the QGIS software by the Georeferencer tool. Seven points are mapped from the map to actual coordinates. Then, by means of the transformation type "Thin Plate Spline" and the resampling method “Cubic Spline", the mapping is made.
    4. The coordinates associated tot the place in the map are taken.

The database

The database is organized around the places marked on the map. Each place has its map id (the number assigned to it on the map), an address, geographical coordinates, and is assigned to a particular category (theatre, ball, cabaret, etc.). It also may have a detailed description, a phone number, prices, working hours, menu, the neighbourhood and the nearest metro station, a photograph and a poster, the Wikidata id.

The costs and working hours were particularly difficult to formalize as they vary greatly. For instance, balls often have different entry prices for men and women, in theatres prices depend on a chosen type of place, cabarets' prices often include a drink, etc. Similarly, the working hours depend on days of week, holidays, seasons. In creating the database we were trying to keep as much as possible this complex structure of the data. We also intended to augment the Guide data by adding the authentic photographs of a place (taken around 1909) and its posters, playbills, placards, which were deeply characteristic of the period. The annotated images were searched on the web and assigned to the places in the database (in the end, only 10% of places have no original image).

Quality Assessment of the performances of extraction

As in the extraction of information multiple processes were carried out, there is a possible source for each of those stages. For each one, a quality assessment is done, going through the identification of possible errors, sampling, verification, and quantification of how accurate the extraction was. At the end of this section, a ponderate accuracy is presented.

In the "text extraction" section, the identified possible errors are: - Indicate that a place was missing in the English guide when it was - Not notice bad-translated words in the English guide, especially coming by the OCR precision - Inadequate categorization of places

In the "data augmentation" section, the identified possible errors are:

- The WikidataID of a place is wrong - The WikidataID of a place seems to point out the right entry, however, when the associated links are studied, the place that is referenced is another one (e.g. the problem of places with the same name in different time periods)

Services provided

The project follows the logic of modern touristic services such as TripAdvisor¨, inviting a user to orient him- or herself in Paris in the Belle Époque. By transferring the historical narrative of the map and the guide into the digital environment, we intend to make it accessible and comprehensible for a very general public.

The application contains three basic entry points: "Places", "Neighbourhoods", and "Experiences", each offering a distinctive perspective on the Parisian cultural life.

Places

The project is built around the places of interest marked on the map and described in the Guide. A user starts with observing the distribution of places on the map. She may sort the places according to its type (theatres, cabarets, balls, restaurants, etc.), its cost, district, etc. All the places have a brief annotation allowing not to get distracted from observing the whole picture, and an extensive annotation comprising a detailed description, images and other data associated with the place. Thus, starting from the general overview one might dig into the further exploration of particular places.

Neighbourhoods 

Another option is to "take a stroll" around a particular quarter following the path proposed by the Guide. One might choose between Montmartre, Quartier Latin, Grand Boulevards and "Dessous de Paris" and get rendered the selection of places as well as an overall description of the chosen neighbourhood and its life in 1909. This strategy of the map exploration allows for concentrating on the particular district of the city – say, Montmartre, which became an emblem of the Parisian Belle Époque.

 Experiences 

Finally, the Experiences section invites a user to immerse him- or herself into the daily routine of the Parisian public in 1909. Parisian uses are sorted here according to the time of day (and a type of experience????) and further described in detail and reflected on the map. For instance, when picking a dancing tea-time (between 4 p.m. and 7 p.m.), one gets the whole description of the tea-dancing practice as well as the best places where it takes place. In providing such a service, we are seeking to extend the "spatial" exploration of the city and to augment the map with vivid details and descriptions, which can be found in the Guide.

Code

https://github.com/MrymZakani/Paris1909TripAdvisor

Team

- Alina, Maryam, and Paola

  1. See, for example, Little, Charles. London Pleasure Guide. London: Simpkin, Marshall, Hamilton, Kent, 1898.