France: Exploring Historical Cookbooks

From FDHwiki
Jump to navigation Jump to search

Introduction

Research questions

1. What were the main ingredients used in 1900 in France?

2. Can we observe a difference per region?

Project Plan and Milestones

Date Tasks Completion
Week 3
  • Find multiple French cookbood in French or English from different times.
  • Prepare slides for the initial project idea presentation.
Week 4
  • Compare different cookbooks, consider the OCR scan and think of possible research questions.
  • Discuss with TAs the goal and implementation of the projects.
Week 5
  • Decide on one French cookbook.
  • Scan the physical book.
Week 6-7
  • Give OCR scan for the pages.
  • Start to construct the dataset.
Week 8-9
  • Prepare for midterm presentaton.
  • Construct dataset and think of data structure to store the information.
Week 10
  • Set up the GitHub repository.
  • Finish the creation of the dataset
Week 11
  • Fix bugs in extraction script and take exceptional cases into consideration.
  • Create categories for ingredients.
Week 12
  • Perform the data processing of the ingredients.
  • Exploratory analysis on the dataset.
Week 13
  • Further improve the dataset.
  • Overall analysis & Per Region analysis.
Week 14
  • Prepare the final presentation
  • Finish the Wikipedia page

Methodology

Data collection

For a start, we scanned a physical French cookbook.

French cookbook.
A page of the scanned book.

The we did a basic OCR for the scanned files. Here is a sample output from OCR.

Sample OCR.

Data digitalization

Template output of the digitization

Data processing

In our project, we will extract the following information from the recipes:

  • quantity
  • unit
  • ingredient

Units 'litre', 'litres', 'l', 'cl', 'dl', 'kg', 'g', 'pincée', 'cuil.', 'cuil. café', 'cuil. soupe', 'cuil. à soupe', 'petite cuil.', 'grande cuil.', 'verre', 'verres', 'petit verre', 'verre à liqueur', 'verres à liqueur', 'tasse', 'tasses', 'bout.', 'bouteille', 'bouteilles', 'grande boîte', 'gousse', 'gousses', 'branche', 'branches', 'membre', 'membres', 'tronçon', 'tronçons', 'tranche', 'tranches', 'tube', 'tubes',

Data analysis

Data visualization

Links

Github repository: Historical Cookbook

Future work

  • Build a search engine that would display the recipes and add filters to search them by name, region or ingredients
  • User-friendly interface to visualize the results of the analysis
  • Comparison with other cookbooks from different periods or different countries