Chinese Cookbook: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 138: Line 138:
=== Data Processing ===
=== Data Processing ===
==== Construct dataset ====
==== Construct dataset ====
 
After obtaining data from the website, it is necessary to clean and organize the data into the following structure: Food_Name, Effect, Ingredients, Steps.


==== Categorize ====
==== Categorize ====
To conduct further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization includes 15 kinds of recipe categories, 10 kinds of cooking methods, 13 kinds of ingredient categories, and 9 kinds of effects. The detail categories are as followed,
- Recipe categories:
- Cooking methods:
- Ingredient categories:
- Effect categories:


==== Translation: Ancient Chinese to Modern Chinese ====
==== Translation: Ancient Chinese to Modern Chinese ====

Revision as of 22:41, 17 December 2023

Introduction

Motivation and description of the deliverables

Project Plan and Milestones

Weekly Project Plan

Date Data Collection Data Processing Data Analysis Web Construction
Week 3 Search historical Chinese cookbooks and compare them
Week 4 Choose one historical Chinese cookbook


Select the chapter to work on

Week 5 Get data from the website First clean and sort the data


Construct the dataset of recipes

Find Ancient Chinese to Modern Chinese translation model

Week 6 Construct the dataset of ingredients


Translating from Ancient Chinese to Modern Chinese

Week 7 Categorize data
Week 8 Analyse cooking method, effect, category, and ingredient frequency


Visualization

Start web construction
Week 9 Analyse ingredient and ingredient category pairing


Visualization

Continue web construction
Week 10 Analyse effect and ingredient pairing


Visualization

Continue web construction


Construct recipe filtering and recommendation system

Week 11 Modren Chinese to English translation Continue web construction


Construct recipe filtering and recommendation system

Add data analysis and visualization to the website

Week 12 Modren Chinese to English translation Continue web construction


Add data analysis and visualization to the website

Week 13 Finalize and improve the website
Week 14 Prepare the Wikipedia page & final presentation

Milestone 1

  • Prepare a project proposal and the goal and objective of the project
  • Get Chinese cooking book data from the website

Milestone 2

  • Clean the data and construct the datasets for the Chinese cooking book
  • Translate from Ancient Chinese and Modern Chinese
  • Categorize the data depending on ingredient, effect, category, and cooking method

Milestone 3

  • Data Analysis
  • Web construction and recipe filtering and recommendation system
  • Prepare final presentation and Wikipedia page

Methods

Data Collection

Figure 1.Chinese Text project website pages for book "Yinshanzhengyao"

"Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases.

"Yinshanzhengyao" is among the books included in this extensive database. Leveraging the well-defined structure of the database, we scrape data from the website. Given the project's specific focus on the recipes within the book, our data extraction is limited to the recipe content, which includes the following chapter:

  • Strange Delicacies of Combined Flavours 1, 2, 3
  • Various Hot Beverages and Concentrates
  • Foods that Cure Various Illnesses

In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes.

Data Processing

Construct dataset

After obtaining data from the website, it is necessary to clean and organize the data into the following structure: Food_Name, Effect, Ingredients, Steps.

Categorize

To conduct further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization includes 15 kinds of recipe categories, 10 kinds of cooking methods, 13 kinds of ingredient categories, and 9 kinds of effects. The detail categories are as followed, - Recipe categories: - Cooking methods: - Ingredient categories: - Effect categories:

Translation: Ancient Chinese to Modern Chinese

Figure 2. Ancient Chinese to Modern Chinese model
Figure 3. ChatGpt

The text is written in ancient Chinese, but contemporary communication predominantly employs modern Chinese. Consequently, for in-depth data analysis, it is essential to translate the recipes from ancient Chinese to modern Chinese. In our evaluation, we compared the proficiency of an ancient Chinese to modern Chinese translation model (Figure 2) against ChatGPT 3.5 (Figure 3). Our findings indicate that the translations generated by ChatGPT are more fluent and closely aligned with contemporary language usage. Based on this observation, we have chosen to adopt ChatGPT as our primary translation tool.

English Translation

Data Analysis

Website

Quality assessment??

Discussion and limitations

Links

- GitHub: https://github.com/changchuntzu0618/DH405-CookingBook