Chinese Cookbook: Difference between revisions
Line 122: | Line 122: | ||
== Methods == | == Methods == | ||
===Data Collection=== | ===Data Collection=== | ||
[[File:Screenshot from 2023-12-16 22-34-27.png|thumb|right|400px|[Figure 1] Chinese Text project website pages for "Yinshanzhengyao"]] | |||
"Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases. | "Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases. | ||
Line 130: | Line 131: | ||
In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes. | In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes. | ||
=== Data Processing === | === Data Processing === |
Revision as of 22:14, 16 December 2023
Project Plan and Milestones
Weekly Project Plan
Date | Data Collection | Data Processing | Data Analysis | Web Construction |
---|---|---|---|---|
Week 3 | Search historical Chinese cookbooks and compare them | |||
Week 4 | Choose one historical Chinese cookbook
|
|||
Week 5 | Get data from the website | First clean and sort the data
|
||
Week 6 | Construct the dataset of ingredients
|
|||
Week 7 | Categorize data | |||
Week 8 | Analyse cooking method, effect, category, and ingredient frequency
|
Start web construction | ||
Week 9 | Analyse ingredient and ingredient category pairing
|
Continue web construction | ||
Week 10 | Analyse effect and ingredient pairing
|
Continue web construction
| ||
Week 11 | Modren Chinese to English translation | Continue web construction
| ||
Week 12 | Modren Chinese to English translation | Continue web construction
| ||
Week 13 | Finalize and improve the website | |||
Week 14 | Prepare the Wikipedia page & final presentation |
Milestone 1
- Prepare a project proposal and the goal and objective of the project
- Get Chinese cooking book data from the website
Milestone 2
- Clean the data and construct the datasets for the Chinese cooking book
- Translate from Ancient Chinese and Modern Chinese
- Categorize the data depending on ingredient, effect, category, and cooking method
Milestone 3
- Data Analysis
- Web construction and recipe filtering and recommendation system
- Prepare final presentation and Wikipedia page
Methods
Data Collection
"Yinshanzhengyao" was published in 1330 during the Yuan dynasty, and all existing editions are derived from the Ming dynasty edition of 1456. Despite the presence of a scanned version of the book on the internet, Optical Character Recognition (OCR) poses a challenge due to the ancient Chinese text and the inclusion of illustrations. Fortunately, the Chinese Text Project (中國哲學書電子書計劃) has undertaken the noble initiative of providing open access to ancient Chinese books for both Chinese and non-Chinese scholars, resulting in the creation of a comprehensive database. Currently, it encompasses over thirty thousand books, making it the largest among historical Chinese literature databases.
"Yinshanzhengyao" is among the books included in this extensive database. Leveraging the well-defined structure of the database, we scrape data from the website. Given the project's specific focus on the recipes within the book, our data extraction is limited to the recipe content, which includes the following chapter:
- Strange Delicacies of Combined Flavours 1, 2, 3
- Various Hot Beverages and Concentrates
- Foods that Cure Various Illnesses
In total, there are 210 recipes, each accompanied by information on its effects, ingredients with quantities, and step-by-step instructions. As a medical text, the "effect" refers to the benefits of the food and precautions to be taken, providing valuable insights into the medicinal properties of the recipes.