Photorealistic rendering of painting + Venice Underwater
Abstract
The main goal of our project is to automatically transform old paintings and drawings of Venice into a photorealistic representation of the past, thus create photos never taken. In order to achieve this goal, we will utilize style transfer generative adversarial networks to learn what separates a drawing or painting from a photograph and then utilize that model.
After a review of the current literature in the field, we settled on implementing the relatively new contrastive unpaired translation GAN with some key modifications. In particular we plan to build a new auxiliary loss function to enforce geometric continuity between generated and original building, as well as add spectral norm and self-attention layers to allow use to effectually expand the size of the network. The target data will be a dataset of real photos of Venice, with datasets of either b/w drawings, or b/w paintings, or coloured paintings as the input. Finally, the trained models should take a drawing or painting as input and deliver its photorealistic representation as output.
As a subgoal of the project, another GAN will be trained to create a visual representation of Venice as an underwater city. This representation takes the rising sea level into account and allows to scale time into the future by visualizing the drowning of the world heritage.
Planning
Week | Tasks |
---|---|
9.11.2020 - 15.11.2020 (week 9) |
|
16.11.2020 - 22.11.2020 (week 10) |
|
23.11.2020 - 29.11.2020 (week 11) |
|
30.11.2020 - 6.12.2020 (week 12) |
|
7.12.2020 - 13.12.2020 (week 13) |
|
14.12.2020 - 16.12.2020 (week 14) | Final project presentation |
Resources
Data collection
- CINI dataset (provided by DH lab)
- ~800 monochrome drawings of Venice
- 331 paintings of Venice
- Web: Google Images and Flickr (scraped)
- Total data:
- ~1300 colour paintings of Venice
- ~1600 photos of Venice*
- ~700 images of underwater shipwrecks or sunken cities (note 22/11: not yet cleaned)
- Total data:
*We overweighted the number of images of landmarks in Venice, as the vast majority of paintings and drawings were of the same few landmarks, for instance St. Mark's Square.
Data cleaning/processing
1. We removed images that have less "conventional" portrayals of the city of Venice, leaving us with:
- 251 monochrome paintings
- ~750 monochrome drawings
- ~900 colour paintings
- ~600 photos
2. We cut out frames from certain images, as pictured in Figures 1 and 2
Note: there is potential to augment each of the mentioned datasets in the future, but we are currently fixated on architectural improvements to our model.
Computational resources
Currently making great use of the GPUs kindly provided by the DH lab. 🖥️
Methodology
As mentioned in the resource section, we invest a substantial portion of effort into data acquisition. We primarily use Selenium web driver for crawling, and urllib to download the source images. Then we preprocess the images manually to remove the ones that are irrelevant to our task or faulty, and then trimmed the images to remove borders either manually or using python image library (PIL).
Once the datasets are ready, we pass them to the server and begin training with the configuration of architecture or parameters that we are testing. We have so far added a bespoke loss auxiliary loss function predicated on maintaining similarity of edges between the input and generated image and generated image, to reproduce architectural features more faithfully. To create this we utilized the openCV2 canny edge filter, and the structural similarity index measure for image comparison. We are currently testing the efficacy of different threshold values in the canny filter and different levels of influence of the auxiliary loss. To do this we have trained a few initial models to visually inspect the outcome, and now with a better understanding of the effect of different levels, are running a sweep to determine which minimizes generator and NCE loss most efficiently. We are also running sweeps across other parameters with the same loss minimizing target. One issue with GANs is that the loss does not always reflect the perceived quality of the images due to overfitting, mode collapse, or learning of perfectly adversarial but not visually appealing strategies. This is an issue we are continuing to monitor but not something with a well known solution.
In terms of ongoing architectural development we have implemented a spectral norm to prevent overfitting and are currently working on a self-attention layer. These two modifications should allow us to make the model deeper (and thus more effective) while preventing overfitting (in this case expressed as generating wholly new images rather than effectively translating the style to the input images). Finally, our eventual measures of success will include the NCE and generator loss levels, but primarily will be our subjective judgement of model performance.
Challenges
Data collection
- Need more data than expected (black and white drawings, black and white paintings, coloured paintings, underwater data, Venice photos, Venice landmark photos)
- Need to clean more data than expected (removing watermarks, borders, removing unusual images)
- Content in input and output images does not always match. For example:
- Large dense crowds in paintings that do not appear in modern day photos
- Old boat structures vs new boat structures
- Collecting good quality underwater data that contains details of underwater structures, rather than just images of algae and pond scum 🤿
Model challenges
- Long training time (~20 hours for 400 epochs on one GPU)
- Many hyperparameters to tune: we must carefully select dependent hyperparameters to test model performance on each sweep
- The CUT model requires more images than described in the paper
Links
- Contrastive Unpaired Translation GitHub
- Canny edge detector (for edge loss implementation)