1. Exploring Similarities in Themes and Style across Age Groups in Children’s Artwork using CLIP 2. Generative Fashion and Assisted Clothing Design with DragGAN: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 36: Line 36:
!scope="row"|Week 6
!scope="row"|Week 6
|State Diffusion Model Deployment
|State Diffusion Model Deployment
|rowspan="1"
|rowspan="1"|
|Make a data set search plan |
|Make a data set search plan  
|rowspan="1"|  
|rowspan="1"|  
|-
|-
Line 52: Line 52:
|rowspan="1"|
|rowspan="1"|
|Dataset Collection in levels and samples manually
|Dataset Collection in levels and samples manually
|
|rowspan="1"|
|-
|-


Line 59: Line 59:
|rowspan="1"|
|rowspan="1"|
|Dataset Collection in levels and samples manually
|Dataset Collection in levels and samples manually
|Continue web construction
|Wiki Page for the project development
|-
|-


Line 71: Line 71:
!scope="row"|Week 11
!scope="row"|Week 11
|rowspan="1"|
|rowspan="1"|
|
|Website improvement in frontend and backend mix
|rowspan="1"|
|Data sets categorizing and tested|
|
|Search Culture Benchmark and data analysis processes
|-
|-


!scope="row"|Week 12
!scope="row"|Week 12
|rowspan="1"|
|rowspan="1"|
|M
|Add samples for differents categories for tested
|rowspan="1"|
|rowspan="1"|
|
|Generation Culture Benchmark, and data analysis
<br> Test the results of the two models <br>
|-  
|-  


!scope="row"|Week 13  
!scope="row"|Week 13  
|rowspan="1"|
|Tested the two models and the needed resources |
|rowspan="1"|
|rowspan="1"|
|rowspan="1"|
|rowspan="1"|
|Finalize and improve the website
|Finalize and improve the website
*publish the website
*Tested the 2 functions and models
|-
|-


!scope="row"|Week 14  
!scope="row"|Week 14  
|
|
|
|
|
|  
|  
| Prepare the Wikipedia page and final report
|Prepare the Wikipedia page and final report
*make the last presentation
<br> make the last presentation <br>
*presented the final project
<br> Final project performance <br>
|}
|}


Line 119: Line 121:




===Information extra & appendices for final wiki===
"ANYTHING TO IMAGE from Culture Perspective"


Objective
The Objective of the project is to prove how far can an AI model understand the cultural differences among 2 cultures presented, we will try to run search/generation processes to prove its understanding.


The model:
*The model we used for the project is Anything2image developed by @Adamdad on Github


The data Base
*The databases are created and defined by the team, manually in 4 different areas with more than 300 samples.


Features
It uses stable diffusion 2-1-unclip model integrated with InternGPT, it can receive either audio, image or text or a mix as input and search or generate an image.






2 functions:


Generation
*This is a model that can be used to generate and modify images based on text prompts.It is a latent diffusion model that uded a fixed,pretrained d text encoder (OpenClip - Vit/H)


Search
*Search function enabled with pretrained models that uses AI to identify objects and match results from a given databe


Topics:


"ANYTHING TO IMAGE from Culture Perspective"


Part 1  ImageBind & Stable Diffusion Introduction
Part 2  Website
Part 3  Experiment Plan for Search Function
Part 4  Experiment Plan for Generation Function


Current State & future plan:
Current State & future plan:
Line 154: Line 177:




Bases:
Part 1  ImageBind & Stable Diffusion Introduction
Part 2  Website
Part 3  Experiment Plan for Search Function
Part 4  Experiment Plan for Generation Function


Part 1.-
Part 1.-

Latest revision as of 16:45, 6 December 2023

Project Plan and Milestones

ANYTHING TO IMAGE - from Culture Perspective

Weekly Project Plan

Date Model programming Web page development Data set construction Experimentation and evaluation
Week 4 Identified the type of model to use
Week 5 ImageBind Model development Website first demo/draft
Week 5 ImageBind model test


Work on backend and model mix

Website frontend design
Week 6 State Diffusion Model Deployment Make a data set search plan
Week 7 Diffusion Model test Division of subtopics and data levels
Week 8 Dataset Collection in levels and samples manually
Week 9 Dataset Collection in levels and samples manually Wiki Page for the project development
Week 10 Dataset Collection in levels and samples manually Continue web construction
Week 11 Website improvement in frontend and backend mix Search Culture Benchmark and data analysis processes
Week 12 Add samples for differents categories for tested Generation Culture Benchmark, and data analysis


Test the results of the two models

Week 13 Finalize and improve the website
  • publish the website
  • Tested the 2 functions and models
Week 14 Prepare the Wikipedia page and final report


make the last presentation

Final project performance

Milestone 1

  • ImageBind Model Deployment
  • Website Demo front end


Milestone 2

  • Dataset Collection
  • Diffusion Model Deployment
  • Wiki Page for the project
  • Website Improvement(Back end & front end)

Milestone 3

  • Test, evaluation and data Analysis
  • Generation Culture Benchmark
  • Prepare final wiki page
  • Prepare final presentation


Information extra & appendices for final wiki

"ANYTHING TO IMAGE from Culture Perspective"

Objective The Objective of the project is to prove how far can an AI model understand the cultural differences among 2 cultures presented, we will try to run search/generation processes to prove its understanding.

The model:

  • The model we used for the project is Anything2image developed by @Adamdad on Github

The data Base

  • The databases are created and defined by the team, manually in 4 different areas with more than 300 samples.

Features It uses stable diffusion 2-1-unclip model integrated with InternGPT, it can receive either audio, image or text or a mix as input and search or generate an image.


2 functions:

Generation

  • This is a model that can be used to generate and modify images based on text prompts.It is a latent diffusion model that uded a fixed,pretrained d text encoder (OpenClip - Vit/H)

Search

  • Search function enabled with pretrained models that uses AI to identify objects and match results from a given databe

Topics:


Part 1 ImageBind & Stable Diffusion Introduction Part 2 Website Part 3 Experiment Plan for Search Function Part 4 Experiment Plan for Generation Function

Current State & future plan:

Finished:

ImageBind Model Deployment State Diffusion Model Deployment Website Demo Website Frontend Design

Ongoing (1 - 8 Dec)

Dataset Collection (Deadline December 5) Wiki Page for the project (Deadline December 6) Website Improvement (Deadline December 8)

Future weeks (8 - 15 Dec)

Search Culture Benchmark (Deadline December 10) Generation Culture Benchmark (Deadline December 15)


Part 1.-

The Introduction of GenAI Models

Imagebind’s joint embedding space enables novel multimodal capabilities

This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).

Part 2- Website

Creation of website of the code, mis front end with back end to hace a results: https://dataoni.com/fdh-project-frontend-page/

Part 3: Experiment Plan for Search Part & Data Set construction

Disseminate the diverse cultural perspectives of the world

"From Cultural Perspectives": this involve recognizing and accurately depicting cultural symbols, costumes, architecture, and other elements unique to various cultures.

Cultures around the world Civilizations through time

3.2 Dataset

Data set generated manually, with more than 300 samples analyzed between different cultures over time and also that occupy a different geographical space.

Each sample comprises at least 2 of the followeing elementos Audio, Image, Tec=xt.

Instruments:

China:

String instrument: guzheng, pipa, erhu, gaohu, harp , dulcimer Blowing instrument: flute, xiao, suona. Percussion instrument: gongs, drums, and wooden fish.

European:

String instrument: violin, cello, guitar, harp Woodwind instrument: Flute, clarinet, oboe. Brass instruments: trumpet, trombone, horn Keyboard instrument: piano, organ

America:

String instrument: chalapatita, bajo, cuatro. Blowing instrument: Pan Flute, Saxophone. Percussion instrument: Conga drums, marimba.

Clothes:

Historical period: Ancient Clothing: Egyptian, Greek, and Roman clothing. Medieval Costume: Renaissance Clothing: 17th to 19th Century Costumes: 20th Century Clothing:

Geography & culture:

Asian Costumes: African clothing European clothing: Middle Eastern and Arab clothing: Costumes of the Americas:

Paintings:

Oil paintings: Renaissance: Baroque: Rococo: Romanticism: Impressionism: Post-Impressionism:. Modern and contemporary art:

Sketching

Chinesse pictures:

Lanscape Flowers and Birds Characters

Food

Asia America European Middle Eastern

Experiment Plan for Search Part:

Remark: Only Imagebind

Basic Perception Distinguish between different types of items such as food and paintings or musical instruments and clothing

General Culture Perception Distinguish between same type of items but from different areas or times such as Chinese food and Mexico food or Medieval Costume and 20th Century Clothing

Accurate Culture Perception Distinguish between same type of items but from different specific culture such as Renaissance Oil Painting and Impressionism Oil Painting or Chinese New Year and Mexican Day of the Dead

Part 4

Approach to studying AI model’s intelligence

Key questions: How can we measure the intelligence of an LLM that has been trained on an unknown but extremely vast corpus of web-text data?---The Standard Benchmark

This approach is designed to separate true learning from mere memorization, and is backed up by a rich theoretical framework [SSBD14, MRT18].

Definition of intelligence: Someone who:

Reasoning Planning Problem-solving Abstract thinking Understanding complex ideas Fast learning Ability to learn from experience.

4.1 Approach to studying AI model’s intelligence

They propose a different approach which is closer to traditional psychology rather than machine learning, leveraging human creativity and curiosity.

They aim to generate novel and difficult tasks and questions that convincingly demonstrate that GPT-4 goes far beyond memorization, and that it has a deep and flexible understanding of concepts, skills, and domains

Experiment plan:

Clothes: The ability of the models to recognize, differentiate, and accurately generate traditional and culturally specific attire. Paintings: Assessing the models' capability to identify and replicate cultural themes, styles, and historical contexts in paintings. of specific cultural art forms. Food: Evaluating the models' proficiency in identifying and visually representing traditional dishes from various cultures.

Multimodal and Integrative Ability Basic Sound & Text & Image Instruments & Text & Image Provide a sample of an instrument's sound, like a piano piece or violin solo. The text describes the instrument and the style of music played, such as "a piano concerto from the Romantic period". Request the model to generate an image that integrates the sound and text, depicting the instrument and the style of music.

4.3 Evaluation and coherence

Evaluation Recognize and Differentiate: distinguish between different cultural elements within the same category Accuracy in Representation: portray the identified cultural elements in terms of visual appearance, cultural context, and associated symbolism. Maintaining cultural accuracy and sensitivity System Coherence Put the generation result back into the search model