Chinese Cookbook: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 143: Line 143:
To facilitate further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization encompasses 15 recipe categories, 10 cooking methods, 13 ingredient categories, and 9 effect categories. The detailed categories are as follows:
To facilitate further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization encompasses 15 recipe categories, 10 cooking methods, 13 ingredient categories, and 9 effect categories. The detailed categories are as follows:
* Recipe categories: 'Paste', 'Pan-fry', 'Dish', 'Thick soup', 'Conge', 'Meat', 'Soup', 'Noodles', 'Rice noodles', 'Pancake', 'Thick liquid', 'Oil', 'Tea', 'Wonton', 'Steamed bun'
* Recipe categories: 'Paste', 'Pan-fry', 'Dish', 'Thick soup', 'Conge', 'Meat', 'Soup', 'Noodles', 'Rice noodles', 'Pancake', 'Thick liquid', 'Oil', 'Tea', 'Wonton', 'Steamed bun'
{|class="wikitable"
!scope="col" width="150"|Category_en
!|Food_Name_en
!|Category
!|Food_Name
|-
|Pancake
|Cow’s Milk Buns
|饼
|牛奶子烧饼
|-
|Dish
|Boiled Sheep’s Breast
|菜品
|熬羊胸子
|-
|Steamed bun
|Eggplant Manta
|馒头
|茄子馒头
|-
|Soup
|Carp Soup
|汤
|鲤鱼汤
|-
|Tea
|Jade Mortar Tea
|茶
|玉磨茶
|}
* Cooking methods: 'Boil', 'Simmer', 'Steam', 'Pan-fry', 'Bake', 'Braise', 'Broil', 'Stir-fry', 'Roast', 'Deep-fry'
* Cooking methods: 'Boil', 'Simmer', 'Steam', 'Pan-fry', 'Bake', 'Braise', 'Broil', 'Stir-fry', 'Roast', 'Deep-fry'
* Ingredient categories: 'Chinese Medicinal Material', 'Plant', 'Spice', 'Fruit', 'Vegetable', 'Seafood', 'Meat', 'Dairy Product', 'Grain', 'Juice', 'Condiment', 'Tea', 'Other'
* Ingredient categories: 'Chinese Medicinal Material', 'Plant', 'Spice', 'Fruit', 'Vegetable', 'Seafood', 'Meat', 'Dairy Product', 'Grain', 'Juice', 'Condiment', 'Tea', 'Other'
* Effect categories:'Gastrointestinal Issues', 'Neurological and Mental Health', 'General Health and Wellness', 'Musculoskeletal Issues', 'Speech-related', 'Heat-clearing', 'Genitourinary Issues', 'Respiratory Issues', 'Others'
* Effect categories:'Gastrointestinal Issues', 'Neurological and Mental Health', 'General Health and Wellness', 'Musculoskeletal Issues', 'Speech-related', 'Heat-clearing', 'Genitourinary Issues', 'Respiratory Issues', 'Others'


'''Categorize data for each recipe'''


The following presents five examples of recipes along with their categorized information.
The following presents five examples of recipes along with their categorized information.

Revision as of 00:57, 18 December 2023

Introduction

Motivation and description of the deliverables

Project Plan and Milestones

Weekly Project Plan

Date Data Collection Data Processing Data Analysis Web Construction
Week 3 Search historical Chinese cookbooks and compare them
Week 4 Choose one historical Chinese cookbook


Select the chapter to work on

Week 5 Get data from the website First clean and sort the data


Construct the dataset of recipes

Find Ancient Chinese to Modern Chinese translation model

Week 6 Construct the dataset of ingredients


Translating from Ancient Chinese to Modern Chinese

Week 7 Categorize data
Week 8 Analyse cooking method, effect, category, and ingredient frequency


Visualization

Start web construction
Week 9 Analyse ingredient and ingredient category pairing


Visualization

Continue web construction
Week 10 Analyse effect and ingredient pairing


Visualization

Continue web construction


Construct recipe filtering and recommendation system

Week 11 Modren Chinese to English translation Continue web construction


Construct recipe filtering and recommendation system

Add data analysis and visualization to the website

Week 12 Modren Chinese to English translation Continue web construction


Add data analysis and visualization to the website

Week 13 Finalize and improve the website
Week 14 Prepare the Wikipedia page & final presentation

Milestone 1

  • Prepare a project proposal and the goal and objective of the project
  • Get Chinese cooking book data from the website

Milestone 2

  • Clean the data and construct the datasets for the Chinese cooking book
  • Translate from Ancient Chinese and Modern Chinese
  • Categorize the data depending on ingredient, effect, category, and cooking method

Milestone 3

  • Data Analysis
  • Web construction and recipe filtering and recommendation system
  • Prepare final presentation and Wikipedia page

Methods

Data Collection

Figure 1.Chinese Text project website pages for book "Yinshanzhengyao"

"Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases.

"Yinshanzhengyao" is among the books included in this extensive database. Leveraging the well-defined structure of the database, we scrape data from the website. Given the project's specific focus on the recipes within the book, our data extraction is limited to the recipe content, which includes the following chapter:

  • Strange Delicacies of Combined Flavours 1, 2, 3
  • Various Hot Beverages and Concentrates
  • Foods that Cure Various Illnesses

In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes.

Data Processing

Construct dataset

After obtaining data from the website, it is necessary to clean and organize the data into the following structure: Food_Name, Effect, Ingredients, Steps.

Categorize

To facilitate further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization encompasses 15 recipe categories, 10 cooking methods, 13 ingredient categories, and 9 effect categories. The detailed categories are as follows:

  • Recipe categories: 'Paste', 'Pan-fry', 'Dish', 'Thick soup', 'Conge', 'Meat', 'Soup', 'Noodles', 'Rice noodles', 'Pancake', 'Thick liquid', 'Oil', 'Tea', 'Wonton', 'Steamed bun'
Category_en Food_Name_en Category Food_Name
Pancake Cow’s Milk Buns 牛奶子烧饼
Dish Boiled Sheep’s Breast 菜品 熬羊胸子
Steamed bun Eggplant Manta 馒头 茄子馒头
Soup Carp Soup 鲤鱼汤
Tea Jade Mortar Tea 玉磨茶
  • Cooking methods: 'Boil', 'Simmer', 'Steam', 'Pan-fry', 'Bake', 'Braise', 'Broil', 'Stir-fry', 'Roast', 'Deep-fry'
  • Ingredient categories: 'Chinese Medicinal Material', 'Plant', 'Spice', 'Fruit', 'Vegetable', 'Seafood', 'Meat', 'Dairy Product', 'Grain', 'Juice', 'Condiment', 'Tea', 'Other'
  • Effect categories:'Gastrointestinal Issues', 'Neurological and Mental Health', 'General Health and Wellness', 'Musculoskeletal Issues', 'Speech-related', 'Heat-clearing', 'Genitourinary Issues', 'Respiratory Issues', 'Others'


The following presents five examples of recipes along with their categorized information.

Recipe_name Category Cooking Method Effect Category Ingredient Category
Sprouting Chinese Foxglove Chicken Dish 'Steam' 'Musculoskeletal Issues', 'General Health and Wellness' 'meat', 'condiment', 'chinese_medicinal_material'
Carp Soup Soup 'Braise', 'Boil' 'Gastrointestinal Issues', 'Genitourinary Issues', 'Others' 'seafood' 'spice' 'plant'
Oil Rape Shoots Broth Thick soup 'Simmer' 'General Health and Wellness' 'spice' 'meat'
Barley Samsa Noodles Rice noodles 'Stir-fry', 'Simmer' 'General Health and Wellness', 'Gastrointestinal Issues' 'plant' 'meat'
Sheep Bone Congee Conge Simmer' 'General Health and Wellness', 'Musculoskeletal Issues' 'condiment' 'spice' 'plant'

Translation: Ancient Chinese to Modern Chinese

Figure 2. Ancient Chinese to Modern Chinese model
Figure 3. ChatGpt

The text is written in ancient Chinese, but contemporary communication predominantly employs modern Chinese. Consequently, for in-depth data analysis, it is essential to translate the recipes from ancient Chinese to modern Chinese. In our evaluation, we compared the proficiency of an ancient Chinese to modern Chinese translation model (Figure 2) against ChatGPT 3.5 (Figure 3). Our findings indicate that the translations generated by ChatGPT are more fluent and closely aligned with contemporary language usage. Based on this observation, we have chosen to adopt ChatGPT as our primary translation tool.

English Translation

Data Analysis

Website

Quality assessment??

Discussion and limitations

Links

- GitHub: https://github.com/changchuntzu0618/DH405-CookingBook