Photorealistic rendering of painting + Venice Underwater
Abstract
The main goal of our project is to transform old paintings and drawings of Venice into a photorealistic representation of the past, thus create photos never taken. Therefore we will test different kinds of GANs (Generative Adversarial Networks, e.g. CycleGAN, BigGAN, PGGAN, Contrastive-unpaired-translation (CUT)) and different kinds of loss functions (e.g. EdgeLoss). They will be trained on a dataset of real photos of Venice together with a dataset of either b/w drawings, or b/w paintings, or coloured paintings. Finally, the trained models should take a drawing or painting as input and deliver its photorealistic representation as output. As a subgoal of the project, another GAN will be trained to create a visual representation of Venice as an underwater city. This representation takes the rising sea level into account and allows to scale time into the future by visualizing the drowning (/ declining?) of the world heritage.
Planning
Week | Tasks |
---|---|
9.11.2020 - 15.11.2020 (week 9) |
|
16.11.2020 - 22.11.2020 (week 10) |
|
23.11.2020 - 29.11.2020 (week 11) |
|
30.11.2020 - 6.12.2020 (week 12) |
|
7.12.2020 - 13.12.2020 (week 13) |
|
14.12.2020 - 16.12.2020 (week 14) | Final project presentation |
Resources
In terms of data, our resources have two main sources: the CINI dataset provided by the DH lab which contained ~800 usable black and white paintings/drawings of Venice, and images we scraped ourselves from the web. We primarily sourced our images from either google images or flickr, acquiring ~1300 color paintings, ~1600 photos of venice, and ~700 underwater images of shipwrecks or sunken cities. After cleaning the datasets we were left with about half of this number in each set. In terms of the photography dataset, we overweighted the number of images of landmarks in Venice, as the vast majority of paintings and drawings were of the same few landmarks (for example: St. Mark's Square). There is potential to augment each of the mentioned datasets in the future, but we are currently fixated on architectural improvements to our model. In terms of computation resources, we have made great use of the GPUs provided by the DH lab.
Methodology
As mentioned in the resource section, we invest a substantial portion of effort into data acquisition. We primarily use Selenium web driver for crawling, and urllib to download the source images. Then we preprocess the images manually to remove the ones that are irrelevant to our task or faulty, and then trimmed the images to remove borders either manually or using python image library (PIL). Once the datasets are ready, we pass them to the server and begin training with the configuration of architecture or parameters that we are testing. We have so far added a bespoke loss auxiliary loss function predicated on maintaining similarity of edges between the input and generated image and generated image, to reproduce architectural features more faithfully. To create this we utilized the openCV2 canny edge filter, and the structural similarity index measure for image comparison. We are currently testing the efficacy of different threshold values in the canny filter and different levels of influence of the auxiliary loss. To do this we have trained a few initial models to visually inspect the outcome, and now with a better understanding of the effect of different levels, are running a sweep to determine which minimizes generator and NCE loss most efficiently. We are also running sweeps across other parameters with the same loss minimizing target. One issue with GANs is that the loss does not always reflect the perceived quality of the images due to overfitting, mode collapse, or learning of perfectly adversarial but not visually appealing strategies. This is an issue we are continuing to monitor but not something with a well known solution. In terms of architectural development we have implemented a spectral norm to prevent overfitting and are currently working on a self-attention layer. These two modifications should allow us to make the model deeper (and thus more effective) while preventing overfitting (in this case expressed as generating wholly new images rather than effectively translating the style to the input images). Finally, our eventual measures of success will include the NCE and generator loss levels, but primarily will be our subjective judgement of model performance.
Challenges
Data collection
- Need more data than expected (black and white drawings, black and white paintings, coloured paintings, underwater data, Venice photos, Venice landmark photos)
- Need to clean more data than expected (removing watermarks, borders, removing unusual images)
- Content in input and output images does not always match. For example:
- Large dense crowds in paintings that do not appear in modern day photos
- Old boat structures vs new boat structures
- Collecting good quality underwater data that contains details of underwater structures, rather than just images of algae and pond scum 🤿
Model challenges
- Long training time (~20 hours for 400 epochs on one GPU)
- Many hyperparameters to tune: we must carefully select dependent hyperparameters to test model performance on each sweep
- The CUT model requires more images than described in the paper
Links
https://github.com/taesungp/contrastive-unpaired-translation