Universal Aesthetics (Multimodal Focus)
Introduction
Methods
Data
As for the convergence of language models, we need both plain texts and aesthetic texts. For simplicity, we reuse this text-image dataset, which is also used in Huh et al.'s paper, and then add another poem dataset.
Plain Text
Poems
For poems, we use the Poems dataset from Kaggle. We find this dataset ideal for this project because of the following reasons:
- As the plain-text dataset contains 1,024 entries, it provides enough poems to yield a substantial amount of data.
- It categorizes the poems into 135 types based on their form (haiku, sonnet, etc.), which could facilitate our further studies.
However, this dataset still needs to be cleaned before usage. We identify two problems with the raw dataset. First, some poems contain copyright notices at the end, which introduce noise into subsequent processing. However, because the copyright information is clearly marked with a special mark ©️, it can be easily removed through rule-based filtering. Second, although most poems are in English, a small portion is not. Since the plain-text dataset contains exclusively English texts, we should also remove the non-English poems from this dataset.
Afterward is an unknown term in future Before that we face the present, Coming at well future depends on present; Dismissing hazardous future Endeavor best early at present. Copyright © Muzahidul Reza | 29 November,2017
The text above shows an example of poems with copyright information. We assume that the mark © does not appear within the poem itself and remove all the content starting from any line that begins with this symbol.
To filter out non-English poems, we use the word frequency list as an auxiliary resource and construct an English lexicon by selecting only the words whose frequencies exceed a certain threshold (10,000). For each poem, we compute the proportion of lemmatized words that appear in this lexicon and apply a threshold to identify English poems. We initially experimented with this English words list, but it was overly inclusive and contained many non-English words such as bonjour. This caused some non-English poems to match a large number of dictionary entries. Therefore, we adopted a frequency-based filtering approach to exclude words that may have been borrowed from other languages and appear in English text only occasionally, despite being included in comprehensive dictionaries.
| Proportion | 0.00 | 0.40 | 0.50 | 0.60 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 1.00 |
|---|---|---|---|---|---|---|---|---|---|---|
| Poem | आज अपने ही खटकने लग गए रिश्ते नाज़ुक थे चटकने लग गए रास्तों की मुश्किलें हल हो गईं आके मंज़िल पर भटकने लग गए क्या कभी पहले भी ऐसा था हुआ बात करते ही अटकने लग गए बेदखल जिनको शहर से कर दिया फिर से गलियों में फटकने लग गए जैसे जैसे हिज्र के दिन बढ़ गए ज़ीस्त के दिन हाए घटने लग गए |
illusionary triskaidekaphobia's unaccountab le |
Nazakat husn se mashroot hoti gar zamane main. To ye muflis pre paker na bikte aane aane main. |
Woyese to hum mile na kahin, ajnabi se they, Rishte na jane kaise kahan ke kabhi ke they. Dekha jo unko aankhon ne chupke se keya kaha, Alam ajeeb dil pe mere bebasi ke they. Majboor kar ke jane kahan ja ke chup gaye, Andaz badalon se dhanki roshni se they. Who din bhi kaise din they ke unke liye mere, Asar thore thore se deewangi ke they. Mujhko pata chala hi nahin le gaye woh dil, Dil ke irade unse zara dillagi ke they. Thori si cher char per roothe they kistarah, Andaz thore thore zara berukhi se they. Ghusse ki chadar orhkar kab tak chupao ge, Honton ke zaviey to tumhare hansi ke they. |
Nu scylun hergan hefaenricaes uard metudæs maecti end his modgidanc uerc uuldurfadur sue he uundra gihuaes eci dryctin or astelidæ he aerist scop aelda barnum heben til hrofe haleg scepen. tha middungeard moncynnæs uard eci dryctin æfter tiadæ firum foldu frea allmectigprimo cantauit Cædmon istud carmen. Nu scilun herga hefenricæs uard metudæs mehti and his modgithanc uerc uuldurfadur sue he uundra gihuæs eci dryctin or astelidæ. he ærist scop ældu barnum hefen to hrofæ halig sceppend tha middingard moncynn&ealig s uard eci dryctin æfter tiadæ firum foldu frea allmehtig MODERN ENGLISH TRANSLATION Now let me praise the keeper of Heaven's kingdom, The might of the Creator, and his thought, The work of the Father of glory, how each of wonders The Eternal Lord established in the beginning. He first created for the sons of men Heaven as a roof, the holy Creator, Then Middle-earth the keeper of mankind, The Eternal Lord, afterwards made, The earth for men, the Almighty Lord. In the beginning Caedmon sang this poem. |
Ne vous étonnez pas, objets sacrés et doux, Si quelqu'air de tristesse obscurcit mon visage. Quand un savant crayon dessinait cette image J'attendais l'échafaud et je pensais à vous. |
ALLAS! my worthi maister honorable, This landes verray tresor and richesse! Deth by thy deth hath harme irreparable Unto us doon: hir vengeable duresse Despoiled hath this land of the swetnesse Of rethorik; for unto Tullius Was never man so lyk amonges us. Also who was hier in philosophie To Aristotle in our tonge but thou? The steppes of Virgile in poesie Thou folwedist eeke, men wot wel ynow. Thou combre-worlde that the my maister slow-- Wolde I slayn were!--Deth, was to hastyf To renne on thee and reve the thi lyf... She myghte han taried hir vengeance a while Til that sum man had egal to the be; Nay, lat be that! sche knew wel that this y1e May never man forth brynge lyk to the, And hir office needes do mot she: God bad hir so, I truste as for the beste; O maister, maister, God thi soule reste! |
N-ew acrostic quatrain I-s brought in for the first time; C-ombination of these two forms K-eeps the beauty so sublime. Topic: Birthday of Nicole "Nick" Asuncion (March 20) Form: Acrostic Quatrain |
(Queer In Quatrain) Now so near, Now so far, You and I are In what a queer! Copyright © Muzahidul Reza │19 March,2018 |
Promise Of A Child (Dramatic Monologue) March 31, 2020 Believe me my, tribe I'm your child I know your dream To my eyes it is open I'll try heart and soul Your desire to fulfill. |