France: Exploring Historical Cookbooks: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 25: Line 25:
!scope="col"|Data Processing
!scope="col"|Data Processing
!scope="col"|Data Analysis
!scope="col"|Data Analysis
!scope="col"|Research Questions
|-
|-


!scope="row"|Week 3
!scope="row"|Week 3
|
|
* Find multiple cookbooks in French or English from different periods of time.
Collect multiple historical cookbooks in French or English
* Prepare project proposals.
|
| - |
|
|
Prepare project proposals
|-
|-


!scope="row"|Week 4
!scope="row"|Week 4
|
|
* Compare different cookbooks, consider the OCR scan and think of possible research questions.
Compare the different cookbooks, considering the layout extraction & OCR
* Discuss the goal and implementation of the project.
|
| - |
|
|
Discuss the objectives of the project
|-
|-


Line 46: Line 51:
* Scan the physical book page by page.
* Scan the physical book page by page.
* Sanitize the initial dataset.
* Sanitize the initial dataset.
| - |
|
|
|-
|-


Line 54: Line 60:
* Perform the OCR of each page.
* Perform the OCR of each page.
* Start to construct the dataset.
* Start to construct the dataset.
| - |
|
|
|-
|-


Line 61: Line 68:
* Construct dataset and think of data structures to store in an optimal way the information.
* Construct dataset and think of data structures to store in an optimal way the information.
'''Milestone 2: Midterm presentation'''
'''Milestone 2: Midterm presentation'''
| - |
|
|
|-
|-


Line 68: Line 76:
* Set up the GitHub repository.
* Set up the GitHub repository.
* Finish the creation of the dataset.
* Finish the creation of the dataset.
| - |
|
|-
|
|-


!scope="row"|Week 11  
!scope="row"|Week 11  
Line 75: Line 84:
* Go through the dataset and perform some changes to facilitate the data processing.
* Go through the dataset and perform some changes to facilitate the data processing.
* Create categories for ingredients.
* Create categories for ingredients.
| - |
|
|
|-
|-


Line 99: Line 109:
* Finish the Wikipedia page
* Finish the Wikipedia page
'''Milestone 3: Final presentation'''
'''Milestone 3: Final presentation'''
| - |
|
|
|-
|-
|}
|}

Revision as of 00:27, 22 December 2022

Introduction

Cuisine has an important place in the cultural heritage of France. In the 21st century, the great classics of French cuisine can be found in starred restaurants of most cities of France and even all around the world. But above all, French cuisine owes its current prestige to the different regional cuisines that were developed over several hundred years, taking advantage of the geographical and cultural specificities of each region.

This is at least the point of view of Mr. Curnonsky who travelled the regions of France throughout his life at the beginning of the 20th century in search of the traditional regional recipes that are the pillars of the French cuisine we know today. His book Recettes des Provinces de France written in 1962 [Figure 1] gathers many traditional recipes collected by himself all around France.

At a time when all knowledge is shared online on the web, it has become easy to obtain information on the history of French cuisine or even many contemporary recipes. However, a significant amount of knowledge and culinary practices are still stored in books that are much more difficult to access. This knowledge would benefit from being digitalized both to share it with the largest number of people, but also to take advantage of the latest computational techniques to perform more in-depth analyses.

This project is hence an exploration of a historical French cookbook. From the physical book to a clean structured dataset, our main focus is on the digitalization of a historical cookbook and its challenges. In addition to that, we use the collected knowledge to extract analyses to better understand the French cuisine of the previous century. We use the cookbook from Mr. Curnonsky mentioned before as an example to answer our research questions.

Research Questions

More specifically, we aim at answering the following research questions:

  • What are the steps and difficulties when digitalizing an old cookbook?
  • What knowledge can be extracted from a cookbook and what information can it provide about the culture and practices of the region at that time?
  • From Mr. Curnonsky's cookbook, what can we say about the regional cuisine of France in the early 20th century?

Project Plan and Milestones

Date Data Collection Data Processing Data Analysis Research Questions
Week 3

Collect multiple historical cookbooks in French or English

Prepare project proposals

Week 4

Compare the different cookbooks, considering the layout extraction & OCR

Discuss the objectives of the project

Week 5
  • Decide on one French cookbook.
  • Scan the physical book page by page.
  • Sanitize the initial dataset.
Week 6-7
  • Separate .png files by subregions.
  • Perform the OCR of each page.
  • Start to construct the dataset.
Week 8-9
  • Construct dataset and think of data structures to store in an optimal way the information.

Milestone 2: Midterm presentation

Week 10
  • Set up the GitHub repository.
  • Finish the creation of the dataset.
Week 11
  • Go through the dataset and perform some changes to facilitate the data processing.
  • Create categories for ingredients.
Week 12
  • Perform the data processing of the ingredients.
  • Exploratory analysis of the dataset.
Week 13
  • Further improve the dataset.
  • Overall analysis & Per region analysis.
Week 14
  • Prepare the final presentation
  • Finish the Wikipedia page

Milestone 3: Final presentation

Methodology

Data collection and digitalization

For a start, we scanned a physical French cookbook. This one is from the 19th century and has ingredients listed on the margin of the page.

Then we did a basic OCR for the scanned files. Here is a sample output from OCR. It is noticeable that there is a mismatch between recipes and ingredients.

Data processing

In our project, we will extract and construct the following information from the recipes:

  • quantity: the amount of the ingredient
  • unit: the metric of the ingredient
  • ingredient: the entity appeared in the recipes
  • category: the major category the ingredient belongs to


Units

Type Unit
Spoons cuil. à café, cuil. café, cuil. à soupe, cuil. soupe, petite cuil., grande cuil., cuil.
Glasses petit verre, verre à liqueur, verres à liqueur, verres, verre, tasses, tasse
Bottles bout., bouteilles, bouteille
Containers g rande boîte, boîtes, boîte, tubes, tube
Spices & Aromatic plants gousses, gousse, branches, branche, bâtons, bâton, pincée
Meat related membres, membre, tronçons, tronçon, tranches, tranche
Standard measures litres , litre , cl , dl , kg , g, l

Categories

We map different ingredients to several major categories

Category Ingredient
Viande (Meat) viande, oie, canard, oiseau, lard, bœuf, veau, poule, poulet, poularde, volaille, porc, caille, canard, caneton, mouton, cochon, coq, chevreuil, lièvre, levraut, lapin, faisan, gibier, jambon, chorizo, cervelas, agneau, escargot, grenouille
Poisson (Fish) poisson, brochet, carpe, morue, lamproie, lotte, maquereau, omble, rouget, sardine, thon, truite, anchois, anguille, merlan, sole, barbue, turbot, raie, perche, saumon, colin, goujon, loup, congre, rascasse, grondin, merlu, merluza, hareng, alose, brême
Fruit de mer (Sea food) crevette, langouste, moule, écrevisse, palourde, homard, chiperon, seiche, huître, coquille, poulpe
Alcool (Alcohol) alcool, bière, vin, cidre, fine, liqueur
Plante aromatique (Aromatic plant) bouquet garni, ail, anis, aromate, angélique, basilic, persil, sarriette, cerfeuil, ciboule, ciboulette, clou de girofle, clous de girofle, girofle, cive, câpre, estragon, feuille de vigne, fines herbes, laurier, menthe, pissenlit, romarin, thym
Epice (Spicy) cannelle, coriandre, curry, safran, poivre, sel, moutarde, muscade, paprika, piment, sauge, serpolet, épices
Produit laitier (Diary product) lait, crème, fromage, gruyère, parmesan
Légume (Vegetable) artichaut, asperge, aubergine, bette, betterave, cardon, chou, cornichon, courgette, cresson, céleri, fenouil, légume, navet, panais, poireau, pomme de terre, pommes de terre, potiron, rave, salade, tomate, échalote, épinard
Fruit (Fruit) abricot, banane, cerise, coing, fraise, framboise, groseille, raisin, olive, orange, pomme
Agrume (Citrus) citron, cédrat, fleur d'oranger, fleurs d'oranger
Céréale (Cereal) farine, pain, pâte, riz
Légumineuse (Legume) févette, haricot, pois
Fruit sec (Nut) amande, noix, noisette
Champignon (Mushroom) champignon, truffe, cèpe, girofle, morille, levure, oronge, duelle


Region

We map subregions to 6 major regions in 19th-century France.

Region Subregion
Paris, Ile-de-France, Val de Loire Paris, Ile-de-France, Orléans, Touraine
Pays de l’Ouest Anjou, Bretagne, Poitou Vendée, Charentes
Sud-Ouest & Pyrénées Bordelais, Gascogne, Pays Basque, Roussillon, Périgord, Languedoc'
Sud-Est & Méditérannée Provence, Nice, Corse, Dauphiné, Savoie, Lyon, Auvergne, Limousin
Bourgogne, Champagne, Bresse, Franche-Comté, Alsace, Lorraine Bourgogne, Champagne, Bresse, Franche-Comté, Alsace, Lorraine
Nord & Normandie Nord, Normandie

Data analysis and visualization

Dataset Overview

We have a total of 352 different recipes from 30 regions, 6 subregions.

Top 10 most used ingredients.

Rank Ingredient Number of occurrences Picture
1) Beurre 180
Beurre.png
2) Sel 167
Sel.jpg
3) Poivre 146
Poivre.jpeg
4) Œufs 101
Œufs.jpeg
5) Oignons 95
Oignons.jpeg
6) Farine 89
Farine.png
7) Persil 82
Persil.png
8) Ail 76
Ail.jpeg
9) Vin balanc 69
Vin balanc.jpeg
10) Bouquet garni 46
Bouquet garni.png


Region Analysis

From this graph, we could see that "Plante aromatique" and "Epice" are frequently used by all the six major regions while "Fruit sec" is the least frequently used one.

Heatmap of categories by region.

Subregion Analysis

Heatmap of categories by subregion.

Co-occurrence Analysis

We could see that "Plante aromatique" and "Epice" appear together a lot, then they appear together with "Viande", "Légume", and "Céréale".

Matrix of co-occurences by subregion.
Map of France.

Discussion and limitations

Like many other research, this project has its limitations. For example, in the data analysis part, it was a roughly count of categories and we did not take quantity into account.

Future work

  • Build a search engine that would display the recipes and add filters to search them by name, region or ingredients
  • User-friendly interface to visualize the results of the analysis
  • Comparison with other cookbooks from different periods or different countries

Links

Github repository

Scanned book

OCR result