Paintings / Photos geolocalisation

From FDHwiki
Jump to navigation Jump to search

Introduction

The goal of this project is to locate a given painting or photo of Venice on the map. In the final website we implement, user can upload an image, and the predicted location of the image will be shown on the map.

Motivation

Travelling is nowadays a universal hobby. There are many platforms like Instagram and Flickr for people to post their travel photos and share with strangers. With geo information of the posts provided by bloggers, other users can search pictures in a specific location and decide if they want to go travel there. But how about seeing an amazing picture without location indicated? We would like to address this problem in our project. We hope to come up with a solution that can locate an image on map so that if someone find a gorgeous picture without geo information, he or she can use our method to find the location of the place and plan a trip there. Also, our method should work on paintings, as the features in the painting should be similar as in photo.

Project Plan and Milestones

Date Task Completion
By Week 3
  • Brainstorm project ideas, come up with at least one feasible innovative idea.
  • Prepare slides for initial project idea presentation.
By Week 8
  • Study related works about geolocalisation.
  • Determine the methods to be used.
  • Obtain geo-tagged images from Flickr as training dataset.
  • Prepare slides for midterm presentation.
By Week 9
  • Implement the SIFT method, try to locate an image based on its feature points.
  • Improve the SIFT method by using multi processing.
  • Get result using SIFT method.
By Week 10
  • Evaluate the result from SIFT.
  • Construct our first regression model based on ResNet101 to obtain preliminary results.
By Week 11
  • Try to fine tune the first model.
  • Try other possible deep learning models.
  • Finalize result of deep learning.
By Week 12
  • Evaluate the results from deep learning.
  • Implement web using Streamlit python package, deep learning method will be used on the web.
By Week 13
  • Sort out the codes and push them to GitHub repository.
  • Write project report.
  • Prepare slides for final presentation.
--
By Week 14
  • Finish presentation slides and report writing.
  • Presentation rehearsal and final presentation.

Methodology

Data collection

We use flickrapi to crawl photos with geo-coordinates of Venice through Flickr. And finally we get 2387 images.

Figure 1: Images Distribution

SIFT

SIFT, scale-invariant feature transform, is a feature detection algorithm to detect and describe local features in images. We try to use this method to detect and describe key points in the image to be geolocalised and images with geo-coordinates. With these key points, we can find the most similar image and then finish the geolocalisation.

  • Dataset spliting

To check the feasibility of our method, we try to use images with geo-coordinates to test. Therefore, we should split our dataset into testing dataset(to find geo-coordinates) and matching dataset(with geo-coordinates). In our experiment, because we do not have a dataset large enough and the matching without parallel is time-consuming, we randomly choose 2% of the dataset to be test dataset.

  • Scale-invariant feature detection and description

We should firstly project the image into a collection of vector features. The keypoints defined thoes who has local maxima and minima of the result of difference of Gaussians function in the vector feature space, and each keypoint will have a descriptor, including location, scale and direction. This process can be simply completed with the python lib CV2.

  • Keypoints matching

To find the most similar image with geo-coordinates, we do keypoints match for each test image, finding the keypoint pairs with all images in the matching dataset. Then, calculating the sum distance of top 50 matched pairs' distance. We choose the image with the smallest sum distance as the most similar image and give its geo-coordinates to the test image.

  • Error analysis

For each match-pair, we calulate the MSE(mean square error) of lantitude and longitude. In order to assess our result, We try to visualise the distribution of MSE and give a 95% CI of median value of MAE by bootstrapping.

Deep Learning

Assessment

Links

[Paintings/Photos geolocalisation GitHub]