FashionGAN: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
(Created page with "== Project Plan and Milestones == ===Weekly Project Plan=== {| class="wikitable" width="80%" ! Week !! Tasks !! Completion |- | Week 4 | * Read topic-related Acadamic papers to figure basic paradigms * Brainstorm and present initial ideas for the project | ✓ |- | Week 5 | * Learn the standard process for NLP preprocessing. * Find suitable news datasets and economic crisis labels. | ✓ |- | Week 6 | * Initially define the entire project's workflow. * Configure th...")
 
 
(51 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Project Plan and Milestones ==
= Motivation =  


===Weekly Project Plan===
The project began with a shared goal: changing how fashion designers work. Understanding the changing fashion landscape, our team aimed to innovate and tackle a common challenge: the search for new ideas and simpler design methods.
 
We wanted to empower designers with a modern tool that goes beyond traditional limits. We envisioned an AI-powered tool not only creating new clothing designs but also inspiring designers looking for fresh ideas.
 
Our project focuses on an AI-driven platform generating unique clothing visuals. By combining technologies like DragGAN and AutoEncoder, designers can access and personalize these designs to match their artistic vision.
 
In essence, we aim to offer designers a valuable resource that nurtures creativity, fosters experimentation, and speeds up design work. Our tool aims to be a catalyst for transformative innovation in the fashion industry by opening up limitless possibilities for designers.
 
= Deliverables =
 
== Dataset ==
 
A folder with 8249 images from 266 fashion shows, scraped from [https://nowfashion.com/ NowFashion]. The images are cleared and converted. We removed the pictures where a face wasn't recognized by our algorithm, and converted all the pictures to 256x256 pixels, as the images need to be square with a number of pixels that is a power of 2. To go with our weak ressources, we chose 256 pixels instead of a more convenient 1024 pixels.
 
== Software ==
 
As we modified the interface of DragGAN, we've put the updated versions of all the files we modified in this sense, with explanations in the README file to how to apply these updates.
 
We've also added to the Github the pkl files containing the weights and configuration of the model we trained for our dataset, as well as examples of generations our model did.
 
We also provided the Python code we used to scrape the data from the website, as well as the code we used to do face recongintion and convert the size of the images.
 
= Milestones and Project plan =
 
 
==Milestones==
===Milestone 1: DragGAN===
 
* Understand how DragGAN works
* Find a dataset that would be appropriate for our utilization
* Train StyleGAN
 
===Milestone 2: Texture swap===
 
* Find a way to apply a texture change on an image
* Train the Swapping Autoencoder for Deep Image Manipulation
* Implement the Texture swap interface in our project
 
 
===Milestone 3: User Interface===
* Change DragGAN's interface to make it more intuitive for our project
 
===Milestone 4: Deliverables===
 
* Deliver the code on Github
* Write the wiki page
* Prepare the presentations
 
====Project Plan====


{| class="wikitable" width="80%"
{| class="wikitable" width="80%"
! Week !! Tasks !! Completion
! Week !! Tasks !! Completion
|-
|-
| Week 4
| align="center" | Week 3
|  
|  
* Read topic-related Acadamic papers to figure basic paradigms
* Choose our project between the ones that were presented in class
* Brainstorm and present initial ideas for the project
* Prepare for the first presentation


|
| align="center" | ✓
|-
| Week 5
|
* Learn the standard process for NLP preprocessing.
* Find suitable news datasets and economic crisis labels.
| ✓
|-
|-
| Week 6
| align="center" |  Weeks 4-5-6
|  
|  
* Initially define the entire project's workflow.
* Find some clothing datasets
* Configure the development environment and master the relevant software and libraries.
* Read papers, understand how DragGAN works
* Finishing data preprocessing on the news dataset.
* Define the project
| ✓
* Get familiar with the libraries we want to use (selenium, opencv...)
| align="center" | ✓
|-
|-
| Week 7
| align="center" | Weeks 7-8-9
|  
|  
* Completing the feature engineering construction and basic pipeline for the TF-IDF based model. (Completed)
* Scrap the data from nowfashion website
* Completing the feature engineering construction and basic pipeline for the sentiment dictionary-based model.
* Resize images
| ✓
* Clean dataset (remove the pictures that aren't in the same format as the other)
* Get familiar with StyleGAN and its functioning
* Find a textile dataset
|  align="center" | ✓
|-
|-
| Weeks 8–9
| align="center" | Week 10
|  
|  
* Choose and train appropriate machine learning models to build feature-to-label mappings.
* Define a precise planning for the following weeks
* Learn and implement cross-validation of timing models to validate model performance.
* Prepare for the presentation
* Analyze the experimental results and summarize preliminary conclusions.
* Write the Project Plan and the Milestones on the wiki page
| ✓
* Figure out how to implement the texture changes
* Begin the training of the dataset with StyleGAN
| align="center" | ✓
|-
|-
| Week 10
| align="center" | Week 11
|  
|  
* Prepare slides for the midterm presentation.(Completed)
* Install a Virtual Machine to use linux
* Fill in information on wiki.
* Finish the training on the VM
| ✓
* Get DragGAN to work on the VM
| align="center" | ✓
|-
|-
| Week 11
| align="center" | Week 12
|  
|  
* Expand the fine-grained news dataset and replenish the economic analysis metrics.
* Work on the texture changing tool
* Introduce pre-trained models with transformer architecture to optimize the extraction of sentiment features.
* Implement the texture changing interface
* Explore a variety of deep learning and machine learning techniques for optimization.
* Start writing the description of the methods on the wiki page
|
| align="center" | ✓
|-
|-
| Week 12
| align="center" | Week 13
|
* Complete project workflows on new datasets with new time series models.
* Compare the results and analyze the correlations between sentiment scores and different financial indicators.
* Decision fusion for enhancing model performance.
|
|-
| Week 13
|
* Achieve visual representation to display news trends, sentiment analysis outcomes, and predictive metrics.
* Finalize modifications and refinements for the project's concluding model iterations.
|
|-
| Week 14
|
* Write the report.
* Prepare for the final presentation.
|  
|  
* Adapt the DragGAN GUI to be more intuitive
*Finalize and clean the code to then commit it to Github
*Finish writing the wiki page
| align="center" |
|}
|}


===Milestones===
= Methods =
====Milestone 1====
 
* Draft a comprehensive project proposal outlining aims and objectives.
For our project, we're basing ourselves on the DragGAN project (https://github.com/XingangPan/DragGAN), into which we're importing a StyleGAN model that has trained our database.
* Identify datasets with appropriate time granularity and relevant economic labels.
 
* Prepare and clean selected datasets for analysis.
== Scraping ==
 
We scraped the images from the [https://nowfashion.com/ NowFashion] website, containing an enormous list of fashion shows and pictures of their clothes. We scraped using the selenium library, by iterating on all shows and saving the websites. A big majority of the pictures are in the same format and have the same "frame" (the models appear in the same way in the picture).
[[File:normal.webp|thumb|"Normal" frame|160px]]
But eventually, there are some picture completely different of the others that we wanted to erase.
[[File:weird_image.jpg|thumb|"Weird" frame|200px]]
The best idea we found to find these was to use a face recognizer. So, we removed the pictures where a face wasn't recognized by our algorithm, and converted all the pictures to 256x256 pixels, as the images need to be square with a number of pixels that is a power of 2. To go with our weak ressources, we chose 256 pixels instead of a more convenient 1024 pixels. All the code is available on our Github. 
 
 
=== Copyright ===
 
The images used in this project are scraped from [https://nowfashion.com/ NowFashion] illegaly. We've used these images purely and simply for learning and studying pruposes. If we had wanted to publish a paper or go further, we would have had to either contact nowfashion and find a way to legally have an access to these images, or find another source of images.
 
== DragGAN ==  
 
DragGAN is a pioneering deep learning model designed for unprecedented controllability in image synthesis. Unlike prior methods reliant on 3D models or supervised learning, DragGAN empowers users to interactively manipulate images by clicking handle and target points, moving them precisely within the image. By utilizing GAN feature spaces, this approach allows for diverse and precise spatial attribute adjustments across various object categories. The model enables efficient, real-time editing without additional networks, facilitating interactive sessions for layout iterations. Evaluated extensively across diverse datasets, DragGAN showcases superior manipulation effects, deforming images respecting underlying object structures, outperforming existing methods in both point tracking and image editing.
 
== StyleGAN ==
 
StyleGAN is a type of generative adversarial network (GAN) developed by NVIDIA that's primarily used for generating high-quality, realistic images. It's renowned for its ability to create lifelike human faces, animals and objects. On all versions of StyleGAN that exists, we chose to work with StyleGAN2-ADA (https://github.com/NVlabs/stylegan2-ada-pytorch), based on StyleGAN2. It is less powerfull than StyleGAN3, but regarding our ressources a model based on StyleGAN2 is more appropriate. ADA stands for "adaptive discriminator augmentation", it signifies that this model is better suited for smaller datasets (for example, StyleGAN2-ADA aims to have the same results than the original model with a 30k datasets instead of 100k).
 
== Texture Swapping ==
 
The Swapping Autoencoder is a specialized deep model designed for image manipulation, not just creating random images. It works by encoding an image into two separate parts and ensures that swapping these parts creates realistic images. One part encodes the structure while the other represents texture by capturing patterns across different areas of the image.
 
The goal using this was to give an additional tool to designers to modify the generations, applying the style and texture of other images on the image they are working on.
 
= Discussion =
== Limitations ==
 
There were several unexpected issues during our attempt to train StyleGAN with our dataset. Without going into details, the different problems we encountered were related to :
* Wrong OS
* Use GPU
* Wrong compiler used
* Not allowed to install modules/libraries
 
Due to these issues, we've been able to start the training only 24 hours before the limit hour to write our wiki page. So the result we share here are unfortunately below the result we could have had with a little more time of training. Also, to have the best results in a minimum of time, we also chose to diminush the resolution of the pictures so the training would take less time. Also, when rescaling the images from the original resolution, we needed to make a choice as the original images were not square : crop them, add stripes on the sides or stretch them. We decided to start by adding stripes, aiming to test the training with these stripes and to also try the other options to compare.
 
Also, we had to sacrifice the implementation of texture swapping for lack of time, as problems with StyleGAN and DragGAN took longer than expected.
 
== Results ==
 
Our results are rather unsatisfactory. Our model didn't have enough time to train, and ideally we would have also trained using images of better quality (1024x1024, for example). As a result, the images generated are of rather poor quality, as we can see from the attached images. On a positive note, the strategy of white stripes on the sides worked well. We could have continued to train our model by giving it images with white stripes on the sides and removing them before output.
[[File:Generations.png|thumb|Example of generations|250px]]


====Milestone 2====
On the DragGAN side, the same disappointment. Based on our trained model, DragGAN doesn't work very well. We can explain this by two different theories. Firstly, quite obviously, the model hasn't trained enough and doesn't at all produce satisfactory enough images for DragGAN to work. Secondly, we can also think of the low variance between the images generated. DragGAN works thanks to the diversity of its database; here, our generated images all look the same.
* Master the NLP processing workflow and techniques.
* Construct TF-IDF representation and emotional indicators in news data.
* Conduct preliminary model adjustments to enhance accuracy based on initial data.


====Milestone 3====
= Github =  
* Implement pre-trained models for sentiment analysis and integrate them into the project.
* Apply decision fusion techniques to optimize model performance.
* Refine and fine-tune the models based on the results and feedback.


====Milestone 4====
[https://github.com/shayayan/FDH-FashionGAN/tree/main FDH-FashionGAN]
* Prepare the final presentation summarizing and visualizing the project findings and outcomes.
* Create and finalize content for the Wikipedia page, documenting the project.
* Conduct a thorough project review and ensure all documentation is complete and accurate.

Latest revision as of 16:00, 20 December 2023

Motivation

The project began with a shared goal: changing how fashion designers work. Understanding the changing fashion landscape, our team aimed to innovate and tackle a common challenge: the search for new ideas and simpler design methods.

We wanted to empower designers with a modern tool that goes beyond traditional limits. We envisioned an AI-powered tool not only creating new clothing designs but also inspiring designers looking for fresh ideas.

Our project focuses on an AI-driven platform generating unique clothing visuals. By combining technologies like DragGAN and AutoEncoder, designers can access and personalize these designs to match their artistic vision.

In essence, we aim to offer designers a valuable resource that nurtures creativity, fosters experimentation, and speeds up design work. Our tool aims to be a catalyst for transformative innovation in the fashion industry by opening up limitless possibilities for designers.

Deliverables

Dataset

A folder with 8249 images from 266 fashion shows, scraped from NowFashion. The images are cleared and converted. We removed the pictures where a face wasn't recognized by our algorithm, and converted all the pictures to 256x256 pixels, as the images need to be square with a number of pixels that is a power of 2. To go with our weak ressources, we chose 256 pixels instead of a more convenient 1024 pixels.

Software

As we modified the interface of DragGAN, we've put the updated versions of all the files we modified in this sense, with explanations in the README file to how to apply these updates.

We've also added to the Github the pkl files containing the weights and configuration of the model we trained for our dataset, as well as examples of generations our model did.

We also provided the Python code we used to scrape the data from the website, as well as the code we used to do face recongintion and convert the size of the images.

Milestones and Project plan

Milestones

Milestone 1: DragGAN

  • Understand how DragGAN works
  • Find a dataset that would be appropriate for our utilization
  • Train StyleGAN

Milestone 2: Texture swap

  • Find a way to apply a texture change on an image
  • Train the Swapping Autoencoder for Deep Image Manipulation
  • Implement the Texture swap interface in our project


Milestone 3: User Interface

  • Change DragGAN's interface to make it more intuitive for our project

Milestone 4: Deliverables

  • Deliver the code on Github
  • Write the wiki page
  • Prepare the presentations

Project Plan

Week Tasks Completion
Week 3
  • Choose our project between the ones that were presented in class
  • Prepare for the first presentation
Weeks 4-5-6
  • Find some clothing datasets
  • Read papers, understand how DragGAN works
  • Define the project
  • Get familiar with the libraries we want to use (selenium, opencv...)
Weeks 7-8-9
  • Scrap the data from nowfashion website
  • Resize images
  • Clean dataset (remove the pictures that aren't in the same format as the other)
  • Get familiar with StyleGAN and its functioning
  • Find a textile dataset
Week 10
  • Define a precise planning for the following weeks
  • Prepare for the presentation
  • Write the Project Plan and the Milestones on the wiki page
  • Figure out how to implement the texture changes
  • Begin the training of the dataset with StyleGAN
Week 11
  • Install a Virtual Machine to use linux
  • Finish the training on the VM
  • Get DragGAN to work on the VM
Week 12
  • Work on the texture changing tool
  • Implement the texture changing interface
  • Start writing the description of the methods on the wiki page
Week 13
  • Adapt the DragGAN GUI to be more intuitive
  • Finalize and clean the code to then commit it to Github
  • Finish writing the wiki page

Methods

For our project, we're basing ourselves on the DragGAN project (https://github.com/XingangPan/DragGAN), into which we're importing a StyleGAN model that has trained our database.

Scraping

We scraped the images from the NowFashion website, containing an enormous list of fashion shows and pictures of their clothes. We scraped using the selenium library, by iterating on all shows and saving the websites. A big majority of the pictures are in the same format and have the same "frame" (the models appear in the same way in the picture).

"Normal" frame

But eventually, there are some picture completely different of the others that we wanted to erase.

"Weird" frame

The best idea we found to find these was to use a face recognizer. So, we removed the pictures where a face wasn't recognized by our algorithm, and converted all the pictures to 256x256 pixels, as the images need to be square with a number of pixels that is a power of 2. To go with our weak ressources, we chose 256 pixels instead of a more convenient 1024 pixels. All the code is available on our Github.


Copyright

The images used in this project are scraped from NowFashion illegaly. We've used these images purely and simply for learning and studying pruposes. If we had wanted to publish a paper or go further, we would have had to either contact nowfashion and find a way to legally have an access to these images, or find another source of images.

DragGAN

DragGAN is a pioneering deep learning model designed for unprecedented controllability in image synthesis. Unlike prior methods reliant on 3D models or supervised learning, DragGAN empowers users to interactively manipulate images by clicking handle and target points, moving them precisely within the image. By utilizing GAN feature spaces, this approach allows for diverse and precise spatial attribute adjustments across various object categories. The model enables efficient, real-time editing without additional networks, facilitating interactive sessions for layout iterations. Evaluated extensively across diverse datasets, DragGAN showcases superior manipulation effects, deforming images respecting underlying object structures, outperforming existing methods in both point tracking and image editing.

StyleGAN

StyleGAN is a type of generative adversarial network (GAN) developed by NVIDIA that's primarily used for generating high-quality, realistic images. It's renowned for its ability to create lifelike human faces, animals and objects. On all versions of StyleGAN that exists, we chose to work with StyleGAN2-ADA (https://github.com/NVlabs/stylegan2-ada-pytorch), based on StyleGAN2. It is less powerfull than StyleGAN3, but regarding our ressources a model based on StyleGAN2 is more appropriate. ADA stands for "adaptive discriminator augmentation", it signifies that this model is better suited for smaller datasets (for example, StyleGAN2-ADA aims to have the same results than the original model with a 30k datasets instead of 100k).

Texture Swapping

The Swapping Autoencoder is a specialized deep model designed for image manipulation, not just creating random images. It works by encoding an image into two separate parts and ensures that swapping these parts creates realistic images. One part encodes the structure while the other represents texture by capturing patterns across different areas of the image.

The goal using this was to give an additional tool to designers to modify the generations, applying the style and texture of other images on the image they are working on.

Discussion

Limitations

There were several unexpected issues during our attempt to train StyleGAN with our dataset. Without going into details, the different problems we encountered were related to :

  • Wrong OS
  • Use GPU
  • Wrong compiler used
  • Not allowed to install modules/libraries

Due to these issues, we've been able to start the training only 24 hours before the limit hour to write our wiki page. So the result we share here are unfortunately below the result we could have had with a little more time of training. Also, to have the best results in a minimum of time, we also chose to diminush the resolution of the pictures so the training would take less time. Also, when rescaling the images from the original resolution, we needed to make a choice as the original images were not square : crop them, add stripes on the sides or stretch them. We decided to start by adding stripes, aiming to test the training with these stripes and to also try the other options to compare.

Also, we had to sacrifice the implementation of texture swapping for lack of time, as problems with StyleGAN and DragGAN took longer than expected.

Results

Our results are rather unsatisfactory. Our model didn't have enough time to train, and ideally we would have also trained using images of better quality (1024x1024, for example). As a result, the images generated are of rather poor quality, as we can see from the attached images. On a positive note, the strategy of white stripes on the sides worked well. We could have continued to train our model by giving it images with white stripes on the sides and removing them before output.

Example of generations

On the DragGAN side, the same disappointment. Based on our trained model, DragGAN doesn't work very well. We can explain this by two different theories. Firstly, quite obviously, the model hasn't trained enough and doesn't at all produce satisfactory enough images for DragGAN to work. Secondly, we can also think of the low variance between the images generated. DragGAN works thanks to the diversity of its database; here, our generated images all look the same.

Github

FDH-FashionGAN