France: Exploring Historical Cookbooks: Difference between revisions
Xiaotian.su (talk | contribs) |
|||
Line 20: | Line 20: | ||
|Week 3 | |Week 3 | ||
| | | | ||
* Find multiple | '''Milestone1: Project proposals''' | ||
* Prepare | * Find multiple cookbooks in French or English from different times. | ||
* Prepare project proposals. | |||
| align="center" | ✓ | | align="center" | ✓ | ||
|- | |- | ||
Line 28: | Line 29: | ||
| | | | ||
* Compare different cookbooks, consider the OCR scan and think of possible research questions. | * Compare different cookbooks, consider the OCR scan and think of possible research questions. | ||
* Discuss | * Discuss the goal and implementation of the project. | ||
| align="center" | ✓ | | align="center" | ✓ | ||
|- | |- | ||
Line 42: | Line 43: | ||
|Week 6-7 | |Week 6-7 | ||
| | | | ||
* Separate files by | * Separate .png files by subregions. | ||
* | * Perform the OCR of each page. | ||
* Start to construct the dataset. | * Start to construct the dataset. | ||
| align="center" | ✓ | | align="center" | ✓ | ||
Line 50: | Line 51: | ||
|Week 8-9 | |Week 8-9 | ||
| | | | ||
'''Midterm presentation''' | |||
* Construct dataset and think of data | * Construct dataset and think of data structures to store in an optimal way the information. | ||
| align="center" | ✓ | | align="center" | ✓ | ||
|- | |- | ||
Line 58: | Line 59: | ||
| | | | ||
* Set up the GitHub repository. | * Set up the GitHub repository. | ||
* Finish the creation of the dataset | * Finish the creation of the dataset. | ||
| align="center" |✓ | | align="center" |✓ | ||
|-✓ | |-✓ | ||
Line 64: | Line 65: | ||
|Week 11 | |Week 11 | ||
| | | | ||
* | * Go through the dataset and perform some changes to facilitate the data processing. | ||
* Create categories for ingredients. | * Create categories for ingredients. | ||
Line 80: | Line 81: | ||
| | | | ||
* Further improve the dataset. | * Further improve the dataset. | ||
* Overall analysis & Per | * Overall analysis & Per region analysis. | ||
| align="center" |✓ | | align="center" |✓ | ||
|- | |- |
Revision as of 16:39, 21 December 2022
Introduction
This project is an exploration of historical French cookbooks containing recipes from the 19th century. Through analyzing these cookbooks, we explore the most frequently used ingredients and food categories by region. We also examine the cooccurrence of ingredients and food categories.
Research questions
1. What were the main ingredients used in 1900 in France?
2. Can we observe a difference per region?
Project Plan and Milestones
Date | Tasks | Completion |
---|---|---|
Week 3 |
Milestone1: Project proposals
|
✓ |
Week 4 |
|
✓ |
Week 5 |
|
✓ |
Week 6-7 |
|
✓ |
Week 8-9 |
Midterm presentation
|
✓ |
Week 10 |
|
✓ |
Week 11 |
|
✓ |
Week 12 |
|
✓ |
Week 13 |
|
✓ |
Week 14 |
|
✓ |
Methodology
Data digitalization
For a start, we scanned a physical French cookbook.
The we did a basic OCR for the scanned files. Here is a sample output from OCR. It is noticeable that there is a mismatch between recipes and ingredients.
Template output of the digitization
Data processing
In our project, we will extract the following information from the recipes:
- quantity: the amount of the ingredient
- unit: the metric of the ingredient
- ingredient: the entity appeared in the recipes
Units
- Spoons: 'cuil. à café', 'cuil. café', 'cuil. à soupe', 'cuil. soupe', 'petite cuil.', 'grande cuil.', 'cuil.',
- Glasses: 'petit verre', 'verre à liqueur', 'verres à liqueur', 'verres', 'verre', 'tasses', 'tasse',
- Bottles: 'bout.', 'bouteilles', 'bouteille',
- Containers:'g rande boîte', 'boîtes', 'boîte', 'tubes', 'tube',
- Spices & Aromatic plants: 'gousses', 'gousse', 'branches', 'branche', 'bâtons', 'bâton', 'pincée',
- Meat related: 'membres', 'membre', 'tronçons', 'tronçon', 'tranches', 'tranche',
- Standard measures: 'litres ', 'litre ', 'cl ', 'dl ', 'kg ', 'g ' , 'l'
Categories
- 'Viande': ['viande', 'oie', 'canard', 'oiseau', 'lard', 'bœuf', 'veau', 'poule', 'poulet', 'poularde', 'volaille', 'porc', 'caille', 'canard', 'caneton', 'mouton', 'cochon', 'coq', 'chevreuil', 'lièvre', 'levraut', 'lapin', 'faisan', 'gibier', 'jambon', 'chorizo', 'cervelas', 'agneau', 'escargot', 'grenouille'],
- 'Poisson': ['poisson', 'brochet', 'carpe', 'morue', 'lamproie', 'lotte', 'maquereau', 'omble', 'rouget', 'sardine', 'thon', 'truite', 'anchois', 'anguille', 'merlan', 'sole', 'barbue', 'turbot', 'raie', 'perche', 'saumon', 'colin', 'goujon', 'loup', 'congre', 'rascasse', 'grondin', 'merlu', 'merluza', 'hareng', 'alose', 'brême'],
- "Fruit de mer": ['crevette', 'langouste', 'moule', 'écrevisse', 'palourde', 'homard', 'chiperon', 'seiche', 'huître', 'coquille', 'poulpe'],
- 'Alcool': ['alcool', 'bière', 'vin', 'cidre', 'fine', 'liqueur'],
- "Plante aromatique": ["bouquet garni", 'ail', 'anis', 'aromate', 'angélique', 'basilic', 'persil', 'sarriette', 'cerfeuil', 'ciboule', 'ciboulette', "clou de girofle", "clous de girofle", 'girofle', 'cive', 'câpre', 'estragon', "feuille de vigne", "fines herbes", 'laurier', 'menthe', 'pissenlit', 'romarin', 'thym'],
- 'Epice': ['cannelle', 'coriandre', 'curry', 'safran', 'poivre', 'sel', 'moutarde', 'muscade', 'paprika', 'piment', 'sauge', 'serpolet', 'épices'],
- "Produit laitier": ['lait', 'crème', 'fromage', 'gruyère', 'parmesan'],
- 'Légume': ['artichaut', 'asperge', 'aubergine', 'bette', 'betterave', 'cardon', 'chou', 'cornichon', 'courgette', 'cresson', 'céleri', 'fenouil', 'légume', 'navet', 'panais', 'poireau', "pomme de terre", "pommes de terre", 'potiron', 'rave', 'salade', 'tomate', 'échalote', 'épinard'],
- 'Fruit': ['abricot', 'banane', 'cerise', 'coing', 'fraise', 'framboise', 'groseille', 'raisin', 'olive', 'orange', 'pomme'],
- 'Agrume': ['citron', 'cédrat', "fleur d'oranger", "fleurs d'oranger"],
- 'Céréale': ['farine', 'pain', 'pâte', 'riz'],
- 'Légumineuse': ['févette', 'haricot', 'pois'],
- 'Fruit sec': ['amande', 'noix', 'noisette'],
- 'Champignon': ['champignon', 'truffe', 'cèpe', 'girofle', 'morille', 'levure', 'oronge', 'duelle']
Region2SubRegion
- "Paris, Ile-de-France, Val de Loire": ['Paris', 'Ile-de-France', 'Orléans', 'Touraine'],
- "Pays de l’Ouest": ['Anjou', 'Bretagne', 'Poitou Vendée', 'Charentes'],
- "Sud-Ouest & Pyrénées": ['Bordelais', 'Gascogne', 'Pays Basque', 'Roussillon', 'Périgord', 'Languedoc'],
- "Sud-Est & Méditérannée": ['Provence', 'Nice', 'Corse', 'Dauphiné', 'Savoie', 'Lyon', 'Auvergne', 'Limousin'],
- "Bourgogne, Champagne, Bresse, Franche-Comté, Alsace, Lorraine": ['Bourgogne', 'Champagne', 'Bresse', 'Franche-Comté', 'Alsace', 'Lorraine'],
- "Nord & Normandie": ['Nord', 'Normandie']
Data analysis
Data visualization
Discussion and limitations
Future work
- Build a search engine that would display the recipes and add filters to search them by name, region or ingredients
- User-friendly interface to visualize the results of the analysis
- Comparison with other cookbooks from different periods or different countries
Links
Github repository: Historical Cookbook