Universal Aesthetics (Multimodal Focus): Difference between revisions
Jiajun.shen (talk | contribs) (→Data) |
Jiajun.shen (talk | contribs) No edit summary |
||
| (35 intermediate revisions by the same user not shown) | |||
| Line 5: | Line 5: | ||
== Data == | == Data == | ||
As for the convergence of language models, we need both plain texts and aesthetic texts. For simplicity, we reuse this [https://huggingface.co/datasets/minhuh/prh/tree/wit_1024 text-image dataset], which is also used in Huh et al.'s paper | As for the convergence of language models, we need both plain texts and aesthetic texts. For simplicity, we reuse this [https://huggingface.co/datasets/minhuh/prh/tree/wit_1024 text-image dataset], which is also used in Huh et al.'s paper, and then add another poem dataset. | ||
=== Plain Text === | === Plain Text === | ||
=== | === Poems === | ||
For poems, we use the [https://www.kaggle.com/datasets/michaelarman/poemsdataset/data Poems dataset] from Kaggle, | For poems, we use the [https://www.kaggle.com/datasets/michaelarman/poemsdataset/data Poems dataset] from Kaggle. We find this dataset ideal for this project because of the following reasons: | ||
* As the plain-text dataset contains 1,024 entries, it provides enough poems to yield a substantial amount of data. | |||
* It categorizes the poems into 135 types based on their form (haiku, sonnet, etc.), which could facilitate our further studies. | |||
== References | ==== Cleaning ==== | ||
Despite the high quality of this dataset, this dataset still needs to be cleaned before usage. We identify two problems with the raw dataset. First, some poems contain copyright notices at the end, which introduce noise into subsequent processing. However, because the copyright information is clearly marked with a special mark ©️, it can be easily removed through rule-based filtering. Second, although most poems are in English, a small portion is not. Since the plain-text dataset contains almost exclusively English texts, we should also remove the non-English poems from this dataset. | |||
<pre> | |||
Afterward is an unknown term in future | |||
Before that we face the present, | |||
Coming at well future depends on present; | |||
Dismissing hazardous future | |||
Endeavor best early at present. | |||
Copyright © Muzahidul Reza | 29 November,2017 | |||
</pre> | |||
The text above shows an example of poems with copyright information. We assume that the mark © does not appear within the poem itself and remove all the content starting from any line that begins with this symbol. | |||
To filter out non-English poems, we use the [https://www.kaggle.com/datasets/rtatman/english-word-frequency word frequency list] as an auxiliary resource and construct an English lexicon by selecting only the words whose frequencies exceed a certain threshold (10,000). For each poem, we compute the proportion of lemmatized words that appear in this lexicon and apply a threshold to identify English poems. We initially experimented with this [https://github.com/dwyl/english-words English words list], but it was overly inclusive and contained many non-English words such as bonjour. This caused some non-English poems to match a large number of dictionary entries. Therefore, we adopted a frequency-based filtering approach to exclude words that may have been borrowed from other languages and appear in English text only occasionally, despite being included in comprehensive dictionaries. The table shows how poems with different proportions of English words detected look like. To minimize bias while filtering out as much non-English data as possible, we chose a threshold of 0.8. | |||
{| class="wikitable" | |||
|- | |||
! Proportion | |||
! 0.00 | |||
! 0.40 | |||
! 0.50 | |||
! 0.60 | |||
! 0.70 | |||
! 0.75 | |||
! 0.80 | |||
! 0.85 | |||
! 0.90 | |||
! 1.00 | |||
|- | |||
| Poem | |||
| आज अपने ही खटकने लग गए<br/>रिश्ते नाज़ुक थे चटकने लग गए<br/>रास्तों की मुश्किलें हल हो गईं<br/>आके मंज़िल पर भटकने लग गए<br/>क्या कभी पहले भी ऐसा था हुआ<br/>... | |||
| illusionary | |||
triskaidekaphobia's | |||
unaccountab le | |||
| Nazakat husn se mashroot hoti gar zamane main. | |||
To ye muflis pre paker na bikte aane aane main. | |||
| Woyese to hum mile na kahin, ajnabi se they,<br/>Rishte na jane kaise kahan ke kabhi ke they.<br/>Dekha jo unko aankhon ne chupke se keya kaha,<br/>Alam ajeeb dil pe mere bebasi ke they.<br/>Majboor kar ke jane kahan ja ke chup gaye,<br/>... | |||
| Nu scylun hergan hefaenricaes uard<br/>metudæs maecti end his modgidanc<br/>uerc uuldurfadur sue he uundra gihuaes<br/>eci dryctin or astelidæ<br/>he aerist scop aelda barnum<br/>... | |||
| Ne vous étonnez pas, objets sacrés et doux, | |||
Si quelqu'air de tristesse obscurcit mon visage. | |||
Quand un savant crayon dessinait cette image | |||
J'attendais l'échafaud et je pensais à vous. | |||
| ALLAS! my worthi maister honorable,<br/>This landes verray tresor and richesse!<br/>Deth by thy deth hath harme irreparable<br/>Unto us doon: hir vengeable duresse<br/>Despoiled hath this land of the swetnesse<br/>... | |||
| N-ew acrostic quatrain<br/>I-s brought in for the first time;<br/>C-ombination of these two forms<br/>K-eeps the beauty so sublime.<br/>Topic: Birthday of Nicole "Nick" Asuncion (March 20)<br/>... | |||
| (Queer In Quatrain)<br/>Now so near,<br/>Now so far,<br/>You and I are<br/>In what a queer!<br/>... | |||
| Promise Of A Child (Dramatic Monologue)<br/>March 31, 2020<br/>Believe me my, tribe<br/>I'm your child<br/>I know your dream<br/>... | |||
|} | |||
==== Aesthetic Rating ==== | |||
To experiment with the convergence of models on poems with varying aesthetic qualities within the dataset, we need to assign each poem a score reflecting its aesthetic quality. However, human rating can be labor-intensive. Therefore, given the alignment between large language models (LLMs) and human preferences, we employed a large model to evaluate the aesthetic quality of poems. To avoid introducing bias and to balance model performance with the annotation consensus in the research community, we used the closed-source GPT-4o model, ensuring that it would not be used for testing alignment in subsequent experiments (in fact, only open-source models were used for later experiments). Our aesthetic scores follow a 5-point scale similar to a [https://en.wikipedia.org/wiki/Likert_scale Likert scale]. Inspired by the [https://en.wikipedia.org/wiki/Prompt_engineering#Chain-of-thought Chain of Thought (CoT)], we asked the model to output a paragraph of thinking to imitate humans' close reading rating process. The prompt used is shown below. We also conducted human evaluation tests to verify the consistency between the model’s assessments and human preferences. | |||
<pre> | |||
You are an objective literary evaluator. | |||
Task: | |||
1) Evaluate the beauty of the following poem and give it an integer score from 1 to 5 (1 = not beautiful, 5 = extremely beautiful). | |||
2) First output a concise, non-sensitive rationale (a clear summary explaining why you gave this score). This should be at most ~200 words, avoid step-by-step internal chain-of-thought. | |||
3) Then output the score. | |||
4) Finally return a JSON object EXACTLY in this format (no extra commentary): | |||
{{ "thinking": "<the concise rationale as a string>", "score": <int 1-5> }} | |||
Poem to evaluate: | |||
<poem_text> | |||
Scoring criteria (you MUST apply these; briefly mention which criteria influenced the score in the rationale): | |||
- Imagery & Sensory Detail (weight 30%): quality and vividness of images, sensory language. | |||
- Emotional Impact (weight 25%): emotional resonance, ability to move reader. | |||
- Language & Diction (weight 15%): word choice, originality, metaphors, semantic richness. | |||
- Structure & Rhythm (weight 15%): line breaks, meter/flow, internal cohesion. | |||
- Originality & Depth (weight 15%): fresh perspective or depth of thought. | |||
Scoring method: evaluate each criterion on 0-10, compute weighted sum, map to 1-5: | |||
total_score_0_10 = weighted average (0-10) | |||
final_score = round(total_score_0_10 / 2) # maps 0-10 to 0-5, round to nearest int, but clamp to 1-5 | |||
Important instructions for your output: | |||
- Do NOT reveal internal chain-of-thought. Only provide a concise rationale (summary) explaining which criteria mattered and how. | |||
- Output MUST be valid JSON as specified in step 4. 'thinking' must be a string, 'score' an integer. | |||
- Example of allowed rationale: "Strong imagery and emotional resonance; language sometimes cliché; good rhythm; overall score 4." | |||
Now perform the evaluation. | |||
</pre> | |||
After completing the annotation of the dataset, we have uploaded it to [https://huggingface.co/datasets/SHENJJ1017/poem_aesthetic_eval Hugging Face] and made it publicly available for future use. The table below presents one example poem for each aesthetic score. As can be observed, although the scores correlate to some extent with the aesthetic quality of the text, the evaluation inevitably introduces additional biases, such as poem length or the rarity of vocabulary. We also address these factors in our experiments. | |||
{| class="wikitable" | |||
|- | |||
! Score | |||
! 1 | |||
! 2 | |||
! 3 | |||
! 4 | |||
! 5 | |||
|- | |||
| Poem | |||
| Everyone standing leaning by the side of a ditch. | |||
| Succeed success<br/>Be successful,<br/>Be happy earning peace<br/>Life is beautiful. | |||
| When Gabby Hayes is more than name,<br/>When clean public rest rooms are not academic<br/>When honor over victory is no longer the province of party<br/>When you see the miracle of movement of every limb<br/>When this list is growing. | |||
| Carefree paper kites soar with childhood dreams<br/>Still in my mind a distant merriment,<br/>Echoes of our laughter and glee it seems<br/>Return with their lost mirth and enchantment.<br/>Kites flying so free with innocent whims<br/>... | |||
| Presiding over a formica counter,<br/>plastic Mother and Child magnetized<br/>to the top of an ancient register,<br/>the heady mix of smells from the open bins<br/>... | |||
|} | |||
<math>k + 5</math> | |||
= Discussion = | |||
= References = | |||
<references /> | <references /> | ||
=Credits= | |||
'''Course: ''' Foundation of Digital Humanities (DH-405), EPFL </br> | |||
'''Professor:''' [http://people.epfl.ch/frederic.kaplan Frédéric Kaplan]</br> | |||
'''Supervisors: '''Alexander Rusnak</br> | |||
'''Authors: ''' | |||
Latest revision as of 18:14, 30 November 2025
Introduction
Methods
Data
As for the convergence of language models, we need both plain texts and aesthetic texts. For simplicity, we reuse this text-image dataset, which is also used in Huh et al.'s paper, and then add another poem dataset.
Plain Text
Poems
For poems, we use the Poems dataset from Kaggle. We find this dataset ideal for this project because of the following reasons:
- As the plain-text dataset contains 1,024 entries, it provides enough poems to yield a substantial amount of data.
- It categorizes the poems into 135 types based on their form (haiku, sonnet, etc.), which could facilitate our further studies.
Cleaning
Despite the high quality of this dataset, this dataset still needs to be cleaned before usage. We identify two problems with the raw dataset. First, some poems contain copyright notices at the end, which introduce noise into subsequent processing. However, because the copyright information is clearly marked with a special mark ©️, it can be easily removed through rule-based filtering. Second, although most poems are in English, a small portion is not. Since the plain-text dataset contains almost exclusively English texts, we should also remove the non-English poems from this dataset.
Afterward is an unknown term in future Before that we face the present, Coming at well future depends on present; Dismissing hazardous future Endeavor best early at present. Copyright © Muzahidul Reza | 29 November,2017
The text above shows an example of poems with copyright information. We assume that the mark © does not appear within the poem itself and remove all the content starting from any line that begins with this symbol.
To filter out non-English poems, we use the word frequency list as an auxiliary resource and construct an English lexicon by selecting only the words whose frequencies exceed a certain threshold (10,000). For each poem, we compute the proportion of lemmatized words that appear in this lexicon and apply a threshold to identify English poems. We initially experimented with this English words list, but it was overly inclusive and contained many non-English words such as bonjour. This caused some non-English poems to match a large number of dictionary entries. Therefore, we adopted a frequency-based filtering approach to exclude words that may have been borrowed from other languages and appear in English text only occasionally, despite being included in comprehensive dictionaries. The table shows how poems with different proportions of English words detected look like. To minimize bias while filtering out as much non-English data as possible, we chose a threshold of 0.8.
| Proportion | 0.00 | 0.40 | 0.50 | 0.60 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 1.00 |
|---|---|---|---|---|---|---|---|---|---|---|
| Poem | आज अपने ही खटकने लग गए रिश्ते नाज़ुक थे चटकने लग गए रास्तों की मुश्किलें हल हो गईं आके मंज़िल पर भटकने लग गए क्या कभी पहले भी ऐसा था हुआ ... |
illusionary
triskaidekaphobia's unaccountab le |
Nazakat husn se mashroot hoti gar zamane main.
To ye muflis pre paker na bikte aane aane main. |
Woyese to hum mile na kahin, ajnabi se they, Rishte na jane kaise kahan ke kabhi ke they. Dekha jo unko aankhon ne chupke se keya kaha, Alam ajeeb dil pe mere bebasi ke they. Majboor kar ke jane kahan ja ke chup gaye, ... |
Nu scylun hergan hefaenricaes uard metudæs maecti end his modgidanc uerc uuldurfadur sue he uundra gihuaes eci dryctin or astelidæ he aerist scop aelda barnum ... |
Ne vous étonnez pas, objets sacrés et doux,
Si quelqu'air de tristesse obscurcit mon visage. Quand un savant crayon dessinait cette image J'attendais l'échafaud et je pensais à vous. |
ALLAS! my worthi maister honorable, This landes verray tresor and richesse! Deth by thy deth hath harme irreparable Unto us doon: hir vengeable duresse Despoiled hath this land of the swetnesse ... |
N-ew acrostic quatrain I-s brought in for the first time; C-ombination of these two forms K-eeps the beauty so sublime. Topic: Birthday of Nicole "Nick" Asuncion (March 20) ... |
(Queer In Quatrain) Now so near, Now so far, You and I are In what a queer! ... |
Promise Of A Child (Dramatic Monologue) March 31, 2020 Believe me my, tribe I'm your child I know your dream ... |
Aesthetic Rating
To experiment with the convergence of models on poems with varying aesthetic qualities within the dataset, we need to assign each poem a score reflecting its aesthetic quality. However, human rating can be labor-intensive. Therefore, given the alignment between large language models (LLMs) and human preferences, we employed a large model to evaluate the aesthetic quality of poems. To avoid introducing bias and to balance model performance with the annotation consensus in the research community, we used the closed-source GPT-4o model, ensuring that it would not be used for testing alignment in subsequent experiments (in fact, only open-source models were used for later experiments). Our aesthetic scores follow a 5-point scale similar to a Likert scale. Inspired by the Chain of Thought (CoT), we asked the model to output a paragraph of thinking to imitate humans' close reading rating process. The prompt used is shown below. We also conducted human evaluation tests to verify the consistency between the model’s assessments and human preferences.
You are an objective literary evaluator.
Task:
1) Evaluate the beauty of the following poem and give it an integer score from 1 to 5 (1 = not beautiful, 5 = extremely beautiful).
2) First output a concise, non-sensitive rationale (a clear summary explaining why you gave this score). This should be at most ~200 words, avoid step-by-step internal chain-of-thought.
3) Then output the score.
4) Finally return a JSON object EXACTLY in this format (no extra commentary):
{{ "thinking": "<the concise rationale as a string>", "score": <int 1-5> }}
Poem to evaluate:
<poem_text>
Scoring criteria (you MUST apply these; briefly mention which criteria influenced the score in the rationale):
- Imagery & Sensory Detail (weight 30%): quality and vividness of images, sensory language.
- Emotional Impact (weight 25%): emotional resonance, ability to move reader.
- Language & Diction (weight 15%): word choice, originality, metaphors, semantic richness.
- Structure & Rhythm (weight 15%): line breaks, meter/flow, internal cohesion.
- Originality & Depth (weight 15%): fresh perspective or depth of thought.
Scoring method: evaluate each criterion on 0-10, compute weighted sum, map to 1-5:
total_score_0_10 = weighted average (0-10)
final_score = round(total_score_0_10 / 2) # maps 0-10 to 0-5, round to nearest int, but clamp to 1-5
Important instructions for your output:
- Do NOT reveal internal chain-of-thought. Only provide a concise rationale (summary) explaining which criteria mattered and how.
- Output MUST be valid JSON as specified in step 4. 'thinking' must be a string, 'score' an integer.
- Example of allowed rationale: "Strong imagery and emotional resonance; language sometimes cliché; good rhythm; overall score 4."
Now perform the evaluation.
After completing the annotation of the dataset, we have uploaded it to Hugging Face and made it publicly available for future use. The table below presents one example poem for each aesthetic score. As can be observed, although the scores correlate to some extent with the aesthetic quality of the text, the evaluation inevitably introduces additional biases, such as poem length or the rarity of vocabulary. We also address these factors in our experiments.
| Score | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Poem | Everyone standing leaning by the side of a ditch. | Succeed success Be successful, Be happy earning peace Life is beautiful. |
When Gabby Hayes is more than name, When clean public rest rooms are not academic When honor over victory is no longer the province of party When you see the miracle of movement of every limb When this list is growing. |
Carefree paper kites soar with childhood dreams Still in my mind a distant merriment, Echoes of our laughter and glee it seems Return with their lost mirth and enchantment. Kites flying so free with innocent whims ... |
Presiding over a formica counter, plastic Mother and Child magnetized to the top of an ancient register, the heady mix of smells from the open bins ... |
<math>k + 5</math>
Discussion
References
Credits
Course: Foundation of Digital Humanities (DH-405), EPFL
Professor: Frédéric Kaplan
Supervisors: Alexander Rusnak
Authors: