France: Exploring Historical Cookbooks: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 164: Line 164:


==Discussion and limitations==
==Discussion and limitations==
Like many other research, this project has its limitations. For example, in the data processing part, data analysis part,


== Future work ==
== Future work ==

Revision as of 16:43, 21 December 2022

Introduction

This project is an exploration of historical French cookbooks containing recipes from the 19th century. Through analyzing these cookbooks, we explore the most frequently used ingredients and food categories by region. We also examine the cooccurrence of ingredients and food categories.

Research questions

1. What were the main ingredients used in 1900 in France?

2. Can we observe a difference per region?

Project Plan and Milestones

Date Tasks Completion
Week 3

Milestone 1: Project proposals

  • Find multiple cookbooks in French or English from different times.
  • Prepare project proposals.
Week 4
  • Compare different cookbooks, consider the OCR scan and think of possible research questions.
  • Discuss the goal and implementation of the project.
Week 5
  • Decide on one French cookbook.
  • Scan the physical book page by page.
  • Sanitize the initial dataset.
Week 6-7
  • Separate .png files by subregions.
  • Perform the OCR of each page.
  • Start to construct the dataset.
Week 8-9

Milestone 2: Midterm presentation

  • Construct dataset and think of data structures to store in an optimal way the information.
Week 10
  • Set up the GitHub repository.
  • Finish the creation of the dataset.
Week 11
  • Go through the dataset and perform some changes to facilitate the data processing.
  • Create categories for ingredients.
Week 12
  • Perform the data processing of the ingredients.
  • Exploratory analysis of the dataset.
Week 13
  • Further improve the dataset.
  • Overall analysis & Per region analysis.
Week 14

Milestone 3: Final presentation

  • Prepare the final presentation
  • Finish the Wikipedia page

Methodology

Data digitalization

For a start, we scanned a physical French cookbook.


Sample OCR.

The we did a basic OCR for the scanned files. Here is a sample output from OCR. It is noticeable that there is a mismatch between recipes and ingredients.

Template output of the digitization

Data processing

In our project, we will extract the following information from the recipes:

  • quantity: the amount of the ingredient
  • unit: the metric of the ingredient
  • ingredient: the entity appeared in the recipes


Units

  • Spoons: 'cuil. à café', 'cuil. café', 'cuil. à soupe', 'cuil. soupe', 'petite cuil.', 'grande cuil.', 'cuil.',
  • Glasses: 'petit verre', 'verre à liqueur', 'verres à liqueur', 'verres', 'verre', 'tasses', 'tasse',
  • Bottles: 'bout.', 'bouteilles', 'bouteille',
  • Containers:'g rande boîte', 'boîtes', 'boîte', 'tubes', 'tube',
  • Spices & Aromatic plants: 'gousses', 'gousse', 'branches', 'branche', 'bâtons', 'bâton', 'pincée',
  • Meat related: 'membres', 'membre', 'tronçons', 'tronçon', 'tranches', 'tranche',
  • Standard measures: 'litres ', 'litre ', 'cl ', 'dl ', 'kg ', 'g ' , 'l'


Categories We map different ingredients to several major categories

  • 'Viande': ['viande', 'oie', 'canard', 'oiseau', 'lard', 'bœuf', 'veau', 'poule', 'poulet', 'poularde', 'volaille', 'porc', 'caille', 'canard', 'caneton', 'mouton', 'cochon', 'coq', 'chevreuil', 'lièvre', 'levraut', 'lapin', 'faisan', 'gibier', 'jambon', 'chorizo', 'cervelas', 'agneau', 'escargot', 'grenouille'],
  • 'Poisson': ['poisson', 'brochet', 'carpe', 'morue', 'lamproie', 'lotte', 'maquereau', 'omble', 'rouget', 'sardine', 'thon', 'truite', 'anchois', 'anguille', 'merlan', 'sole', 'barbue', 'turbot', 'raie', 'perche', 'saumon', 'colin', 'goujon', 'loup', 'congre', 'rascasse', 'grondin', 'merlu', 'merluza', 'hareng', 'alose', 'brême'],
  • "Fruit de mer": ['crevette', 'langouste', 'moule', 'écrevisse', 'palourde', 'homard', 'chiperon', 'seiche', 'huître', 'coquille', 'poulpe'],
  • 'Alcool': ['alcool', 'bière', 'vin', 'cidre', 'fine', 'liqueur'],
  • "Plante aromatique": ["bouquet garni", 'ail', 'anis', 'aromate', 'angélique', 'basilic', 'persil', 'sarriette', 'cerfeuil', 'ciboule', 'ciboulette', "clou de girofle", "clous de girofle", 'girofle', 'cive', 'câpre', 'estragon', "feuille de vigne", "fines herbes", 'laurier', 'menthe', 'pissenlit', 'romarin', 'thym'],
  • 'Epice': ['cannelle', 'coriandre', 'curry', 'safran', 'poivre', 'sel', 'moutarde', 'muscade', 'paprika', 'piment', 'sauge', 'serpolet', 'épices'],
  • "Produit laitier": ['lait', 'crème', 'fromage', 'gruyère', 'parmesan'],
  • 'Légume': ['artichaut', 'asperge', 'aubergine', 'bette', 'betterave', 'cardon', 'chou', 'cornichon', 'courgette', 'cresson', 'céleri', 'fenouil', 'légume', 'navet', 'panais', 'poireau', "pomme de terre", "pommes de terre", 'potiron', 'rave', 'salade', 'tomate', 'échalote', 'épinard'],
  • 'Fruit': ['abricot', 'banane', 'cerise', 'coing', 'fraise', 'framboise', 'groseille', 'raisin', 'olive', 'orange', 'pomme'],
  • 'Agrume': ['citron', 'cédrat', "fleur d'oranger", "fleurs d'oranger"],
  • 'Céréale': ['farine', 'pain', 'pâte', 'riz'],
  • 'Légumineuse': ['févette', 'haricot', 'pois'],
  • 'Fruit sec': ['amande', 'noix', 'noisette'],
  • 'Champignon': ['champignon', 'truffe', 'cèpe', 'girofle', 'morille', 'levure', 'oronge', 'duelle']


Region We map subregions to 6 major regions in 19th century France.

  • "Paris, Ile-de-France, Val de Loire": ['Paris', 'Ile-de-France', 'Orléans', 'Touraine'],
  • "Pays de l’Ouest": ['Anjou', 'Bretagne', 'Poitou Vendée', 'Charentes'],
  • "Sud-Ouest & Pyrénées": ['Bordelais', 'Gascogne', 'Pays Basque', 'Roussillon', 'Périgord', 'Languedoc'],
  • "Sud-Est & Méditérannée": ['Provence', 'Nice', 'Corse', 'Dauphiné', 'Savoie', 'Lyon', 'Auvergne', 'Limousin'],
  • "Bourgogne, Champagne, Bresse, Franche-Comté, Alsace, Lorraine": ['Bourgogne', 'Champagne', 'Bresse', 'Franche-Comté', 'Alsace', 'Lorraine'],
  • "Nord & Normandie": ['Nord', 'Normandie']

Data analysis

Data visualization

Map of France.

Discussion and limitations

Like many other research, this project has its limitations. For example, in the data processing part, data analysis part,

Future work

  • Build a search engine that would display the recipes and add filters to search them by name, region or ingredients
  • User-friendly interface to visualize the results of the analysis
  • Comparison with other cookbooks from different periods or different countries

Links

Github repository: Historical Cookbook