Chinese Cookbook
Introduction
Motivation and description of the deliverables
Project Plan and Milestones
Weekly Project Plan
Date | Data Collection | Data Processing | Data Analysis | Web Construction |
---|---|---|---|---|
Week 3 | Search historical Chinese cookbooks and compare them | |||
Week 4 | Choose one historical Chinese cookbook
|
|||
Week 5 | Get data from the website | First clean and sort the data
|
||
Week 6 | Construct the dataset of ingredients
|
|||
Week 7 | Categorize data | |||
Week 8 | Analyse cooking method, effect, category, and ingredient frequency
|
Start web construction | ||
Week 9 | Analyse ingredient and ingredient category pairing
|
Continue web construction | ||
Week 10 | Analyse effect and ingredient pairing
|
Continue web construction
| ||
Week 11 | Modren Chinese to English translation | Continue web construction
| ||
Week 12 | Modren Chinese to English translation | Continue web construction
| ||
Week 13 | Finalize and improve the website | |||
Week 14 | Prepare the Wikipedia page & final presentation |
Milestone 1
- Prepare a project proposal and the goal and objective of the project
- Get Chinese cooking book data from the website
Milestone 2
- Clean the data and construct the datasets for the Chinese cooking book
- Translate from Ancient Chinese and Modern Chinese
- Categorize the data depending on ingredient, effect, category, and cooking method
Milestone 3
- Data Analysis
- Web construction and recipe filtering and recommendation system
- Prepare final presentation and Wikipedia page
Methods
Data Collection
"Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases.
"Yinshanzhengyao" is among the books included in this extensive database. Leveraging the well-defined structure of the database, we scrape data from the website. Given the project's specific focus on the recipes within the book, our data extraction is limited to the recipe content, which includes the following chapter:
- Strange Delicacies of Combined Flavours 1, 2, 3
- Various Hot Beverages and Concentrates
- Foods that Cure Various Illnesses
In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes.
Data Processing
Construct dataset
After obtaining data from the website, it is necessary to clean and organize the data into the following structure: Food_Name, Effect, Ingredients, Steps.
Categorize
To facilitate further data analysis and implement the search function for the website, the initial step involves categorizing our data. This categorization encompasses 15 recipe categories, 10 cooking methods, 13 ingredient categories, and 9 effect categories. The detailed categories are as follows:
- Recipe categories: 'Paste', 'Pan-fry', 'Dish', 'Thick soup', 'Conge', 'Meat', 'Soup', 'Noodles', 'Rice noodles', 'Pancake', 'Thick liquid', 'Oil', 'Tea', 'Wonton', 'Steamed bun'
Category_en | Food_Name_en | Category | Food_Name |
---|---|---|---|
Pancake | Cow’s Milk Buns | 饼 | 牛奶子烧饼 |
Dish | Boiled Sheep’s Breast | 菜品 | 熬羊胸子 |
Steamed bun | Eggplant Manta | 馒头 | 茄子馒头 |
Soup | Carp Soup | 汤 | 鲤鱼汤 |
Tea | Jade Mortar Tea | 茶 | 玉磨茶 |
- Cooking methods: 'Boil', 'Simmer', 'Steam', 'Pan-fry', 'Bake', 'Braise', 'Broil', 'Stir-fry', 'Roast', 'Deep-fry'
- Ingredient categories: 'Chinese Medicinal Material', 'Plant', 'Spice', 'Fruit', 'Vegetable', 'Seafood', 'Meat', 'Dairy Product', 'Grain', 'Juice', 'Condiment', 'Tea', 'Other'
- Effect categories:'Gastrointestinal Issues', 'Neurological and Mental Health', 'General Health and Wellness', 'Musculoskeletal Issues', 'Speech-related', 'Heat-clearing', 'Genitourinary Issues', 'Respiratory Issues', 'Others'
The following presents five examples of recipes along with their categorized information.
Recipe_name | Category | Cooking Method | Effect Category | Ingredient Category |
---|---|---|---|---|
Sprouting Chinese Foxglove Chicken | Dish | 'Steam' | 'Musculoskeletal Issues', 'General Health and Wellness' | 'meat', 'condiment', 'chinese_medicinal_material' |
Carp Soup | Soup | 'Braise', 'Boil' | 'Gastrointestinal Issues', 'Genitourinary Issues', 'Others' | 'seafood' 'spice' 'plant' |
Oil Rape Shoots Broth | Thick soup | 'Simmer' | 'General Health and Wellness' | 'spice' 'meat' |
Barley Samsa Noodles | Rice noodles | 'Stir-fry', 'Simmer' | 'General Health and Wellness', 'Gastrointestinal Issues' | 'plant' 'meat' |
Sheep Bone Congee | Conge | Simmer' | 'General Health and Wellness', 'Musculoskeletal Issues' | 'condiment' 'spice' 'plant' |
Translation: Ancient Chinese to Modern Chinese
The text is written in ancient Chinese, but contemporary communication predominantly employs modern Chinese. Consequently, for in-depth data analysis, it is essential to translate the recipes from ancient Chinese to modern Chinese. In our evaluation, we compared the proficiency of an ancient Chinese to modern Chinese translation model (Figure 2) against ChatGPT 3.5 (Figure 3). Our findings indicate that the translations generated by ChatGPT are more fluent and closely aligned with contemporary language usage. Based on this observation, we have chosen to adopt ChatGPT as our primary translation tool.
English Translation
Data Analysis
Website
Quality assessment??
Discussion and limitations
Links
- GitHub: https://github.com/changchuntzu0618/DH405-CookingBook