France: Exploring Historical Cookbooks: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 14: Line 14:
{|class="wikitable"  
{|class="wikitable"  
!scope="col" width="100"|Date
!scope="col" width="100"|Date
!|Task
!|Tasks
! |Completion
! |Completion
|-
|-
|By Week 10
 
|Week 3
|
* Find multiple French cookbood in French or English from different times.
* Prepare slides for the initial project idea presentation.
| align="center" | ✓
|-
 
|Week 4
|
* Compare different cookbooks, consider the OCR scan and think of possible research questions.
* Discuss with TAs the goal and implementation of the projects.
| align="center" | ✓
|-
 
|Week 5
|
* Decide on one French cookbook.
* Scan the physical book.
| align="center" | ✓
|-
 
|Week 6-7
|
|
Finish the creation of the dataset
* Give OCR scan for the pages.
* Start to construct the dataset.
| align="center" | ✓
|-
 
|Week 8-9
|
* Prepare for midterm presentaton.
* Construct dataset and think of data structure to store the information.
| align="center" | ✓
|-
 
|Week 10
|
* Set up the GitHub repository.
* Finish the creation of the dataset
| align="center" |✓
| align="center" |✓
|-✓
|-✓
|By Week 11  
 
|Week 11  
|
|
Perform the data processing of the ingredients
* Fix bugs in extraction script and take exceptional cases into consideration.
* Create categories for ingredients.


| align="center" |✓
| align="center" |✓
|-
|-
|By Week 12  
 
|Week 12  
|
|
Exploratory analysis on the dataset
* Perform the data processing of the ingredients.
* Exploratory analysis on the dataset.
| align="center" |✓
| align="center" |✓
|-
|-


|By Week 13  
|Week 13  
|
|
Overall analysis & Per Region analysis
* Further improve the dataset.
* Overall analysis & Per Region analysis.
| align="center" |✓
| align="center" |✓
|-
|-


|By Week 14  
|Week 14  
|
|
Prepare the final presentation & finish the wikipedia page
* Prepare the final presentation
* Finish the Wikipedia page
| align="center" |✓
| align="center" |✓
|-
|-

Revision as of 08:36, 21 December 2022

Introduction

Research questions

1. What were the main ingredients used in 1900 in France?

2. Can we observe a difference per region?

Project Plan and Milestones

Date Tasks Completion
Week 3
  • Find multiple French cookbood in French or English from different times.
  • Prepare slides for the initial project idea presentation.
Week 4
  • Compare different cookbooks, consider the OCR scan and think of possible research questions.
  • Discuss with TAs the goal and implementation of the projects.
Week 5
  • Decide on one French cookbook.
  • Scan the physical book.
Week 6-7
  • Give OCR scan for the pages.
  • Start to construct the dataset.
Week 8-9
  • Prepare for midterm presentaton.
  • Construct dataset and think of data structure to store the information.
Week 10
  • Set up the GitHub repository.
  • Finish the creation of the dataset
Week 11
  • Fix bugs in extraction script and take exceptional cases into consideration.
  • Create categories for ingredients.
Week 12
  • Perform the data processing of the ingredients.
  • Exploratory analysis on the dataset.
Week 13
  • Further improve the dataset.
  • Overall analysis & Per Region analysis.
Week 14
  • Prepare the final presentation
  • Finish the Wikipedia page

Methodology

Data collection

For a start, we scanned a physical French cookbook.

French cookbook.
A page of the scanned book.

The we did a basic OCR for the scanned files. Here is a sample output from OCR.

Sample OCR.

Data digitalization

Template output of the digitization

Data processing

In our project, we will extract the following information from the recipes:

  • quantity
  • unit
  • ingredient

Units 'litre', 'litres', 'l', 'cl', 'dl', 'kg', 'g', 'pincée', 'cuil.', 'cuil. café', 'cuil. soupe', 'cuil. à soupe', 'petite cuil.', 'grande cuil.', 'verre', 'verres', 'petit verre', 'verre à liqueur', 'verres à liqueur', 'tasse', 'tasses', 'bout.', 'bouteille', 'bouteilles', 'grande boîte', 'gousse', 'gousses', 'branche', 'branches', 'membre', 'membres', 'tronçon', 'tronçons', 'tranche', 'tranches', 'tube', 'tubes',

Data analysis

Data visualization

Links

Github repository: Historical Cookbook

Future work

  • Build a search engine that would display the recipes and add filters to search them by name, region or ingredients
  • User-friendly interface to visualize the results of the analysis
  • Comparison with other cookbooks from different periods or different countries