Projects: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
No edit summary
 
(375 intermediate revisions by 92 users not shown)
Line 1: Line 1:
= ClioWire platform =
= Projects 2024 =  


On the basis on existing codes, develop API for the other groups (posting pulses, searching pulses)
Supervisor contact:
Develop bots to rewrite pulses based on other sources.
* Alexander Michael Rusnak (alexander.rusnak@epfl.ch)
* Tommy Bruzzese (tommy.bruzzese@epfl.ch)
* Tristan Karch (tristan.karch@epfl.ch)


Initial code base may be chosen among the following options : Social, Mastodon, etc.
== Student Groups==
{| class="wikitable"
|- style="font-weight:bold;"
! Group Number
! First Student
! Second Student
! Third Student
! Projects
|-
| 1
| Eglantine Vialaneix
| Lisa Nja
| Nathanael Lambert
| '''[[Widows in Venice]]'''
|-
| 2
| Chiara Delvecchio
| Haotian Fang
| Vittoria Meroni
| '''[[Sanudo's Diary]]'''
|-
| 3
| Sahil Singhvi
| Owen Yu-Wei Ng
| Rasmus Makiniemi
| '''[[Data Ingestion of Guide Commericiale]]'''
|-
| 4
| Nastaran Hashemisanjani
| Fawzia Zeitoun
| Bich Ngoc Doan
| '''[[Generation of Textual Description]]'''
|-
| 5
| Ching-Chi Chou
| André Da Gloria Santiago
| Yi Ren
| '''[[Traghetti in Venice]]'''
|-
| 6
| Zhichen Fang
| Ruyin Feng
|
| '''[[Generation of Textual Description]]'''
|-
| 7
| Yasmine Kroknes-Gomez
| Oscar Goudet
| Weilun Xu
|'''[[Reconstruction of Partial Facades]]'''
|-
| 8
| Kaile Chu
| Jiaqi Ding
| Maitri Anil Dedhia
|  '''[[Sanudo's Diary]]'''


Knowledge required : Python, PHP, Javascrit, mySQL
|-
| 9
| Shu Yang
| Lisa Vind
| Clay Cassius Parker Foye
| '''[[Completing Facades]]'''
|-
| 10
| Xin Huang
| Yutaka Osaki
| Ziang Guo
|  '''[[LLM-generated Visualization]]'''
|}


= Decomposition in elementary units of the secondary sources =
= Projects 2023 =


= Decomposition in elementary units of cadasters =
The name of the different supervisor for the projects are:
* Beatrice Vaienti (beatrice.vaienti@epfl.ch)
* Sven Najem-Meyer (sven.najem-meyer@epfl.ch)
* Alexander Michael Rusnak (alexander.rusnak@epfl.ch)


= Decomposition in elementary units of image banks =
== Student groups ==
{| class="wikitable"
|- style="font-weight:bold;"
! Group Number
! First Student
! Second Student
! Third Student
! Fourth Student
! Projects
|-
| 1
| Arundhati BALASUBRAMANIAM
| Carolina MARUGAN RUBIO
| Davide ROMANO
|
| '''[[Ethical Guidance of LLMs]]'''
|-
| 2
| Jingbang LIU
| Yanzi LIU
| Changkong ZHOU
|
| '''[[Europeana: mapping postcards]]'''
|-
| 3
| Cindy TANG
| Xi LEI
| Yiren CAO
|
| '''[[AI Ethics in Large Language Models 🤖]]''' commit history can be found [https://fdh.epfl.ch/index.php/Generative_AI:_1._Ethics_2.CLIP here]
|-
| 4
| ChunTzu CHANG
| Yunlong DONG
| Tz-Ching YU
|
| '''[[Digitizing a Chinese Cookbook: Yinshanzhengyao 飲膳正要]]'''
|-
| 5
| Zimu ZHAO
| Xingyu PAN
|
|
| '''[[Tracking a Historic Market Crash through Articles]]'''
|-
| 6
| Haotian WU
| Diego MARQUEZ
| Yiwei LIU
| Maximiliano CARVAJAL
| '''[[Extending Text2Image Models to Accept Multi-Modal Conditions by Encoding to the CLIP Latent Space]]'''
|-
| 7
| Tianhao DAI
| Kaede JOHNSON
| Mengjie ZOU
|
| '''[[Extracting Toponyms from Maps of Jerusalem]]'''
|-
| 8
| Daniele BELFIORE
| Jiaming JIANG
| Nil ILBA
|
| '''[[Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858)]]'''
|-
| 9
| Romane CLERC
| Shayan KHAJEHNOURI
|
|
| '''[[FashionGAN]]'''
|}


= Inference =
= Projects 2022 =
 
== Projects list ==
 
# Venice - Numeri Civici Linking (House Numbers). (Paul)
# Venice - Linked Data for Monuments and Image Gallery. (Paul)
# Venice - 1100-1300 Morphology and Events (Paul, Didier)
# Venice - Sea Level Rise Scenarios (Beatrice, Paul)
# Jerusalem - Procedural Modelling of Muqarnas (Beatrice)
# Jerusalem - 1940: From an Atlas to a GIS Database (Beatrice, Paul)
# Jerusalem - Locating Colonies and Neighbourhoods (Beatrice, Sven, Paul)
# Paris - Address Books of the Past (Paul, Didier)
# France/Italy - Exploring Historical Cookbooks (Beatrice, Paul, Sven)
# Switzerland - 3D Modelling of Glaciers from 120,000BC to Today (Didier, Beatrice)
# Switzerland - 1874 Mapping of Glaciers using DeepLearning (Didier, Beatrice)
# Europeana - A New Spatiotemporal Search Engine (Didier, Sven)
# Europeana - Mapping Postcards (Didier, Sven)
# Armenia - Historical Epigraphy (Hamest)
 
The name of the different supervisor for the projects are:
* Didier Dupertuis (didier.dupertuis@epfl.ch)
* Beatrice Vaienti (beatrice.vaienti@epfl.ch)
* Paul Guhennec (paul.guhennec@epfl.ch)
* Sven Najem-Meyer (sven.najem-meyer@epfl.ch)
* Hamest Tamrazyan (hamest.tamrazyan@epfl.ch)
 
You can however still do a project outside of the DHLAB's fields of expertise. We will do our best to help you.
 
== Guidelines ==
 
'''For the first presentation (06.10):'''
*(1) Choose two or three projects from the project list. You can invent your own if you wish.
*(2) Form a group of 2/3 people and fill out the [[#Groups|table]].
*(3) Prepare a short presentation (1-2 slides) for your projects.
 
== Student groups ==
{| class="wikitable"
|- style="font-weight:bold;"
! Group Number
! First Student
! Second Student
! Third Student
! Project
|-
| 1
| Lea Marxen
| Ben Kriesel
|
| '''[[Paris: address book of the past]]'''
|-
| 2
| Weier Liu
| Linying Yao
|
| '''[[Jerusalem: locating the colonies and neighborhoods]]'''
|-
| 3
| Ghali Chraïbi
| Su Xiaotian
|
| '''[[France: Exploring Historical Cookbooks]]'''
|-
| 4
| Mariella Daghfal
| Margaux Zwierski
|
| '''[[Procedural modeling of Muqarnas]]'''
|-
| 5
| Xingchen Li
| Xinyi Ding
| Ke Li
| '''[[Europeana: A New Spatiotemporal Search Engine]]'''
|-
| 6
| Zhiye Wang
| Aibin Yu
|
| '''[[Detection of glacier change using dhSegment from the Siegfried map from 1874]]'''
|}
 
= Projects 2021 =
 
== Student groups ==
{| class="wikitable"
|- style="font-weight:bold;"
! Group Number
! First Student
! Second Student
! Third Student
! Project
|-
| 1
| Yichen Wang
| Amina Matt
|
| '''[http://fdh.epfl.ch/index.php/Switzerland_and_the_Transatlantic_Slavery Switzerland and the Transatlantic Slavery]'''
|-
| 2
| Junzhe Tang
| Zhipeng Mao
|
| '''[http://fdh.epfl.ch/index.php/Jerusalem_1840-1949_Historical_Maps_Georeferencing Jerusalem 1840-1949 Historical Maps Georeferencing]'''
|-
| 3
| Veniamin Veselovsky
| Hannah Casey
|
| '''[http://fdh.epfl.ch/index.php/Venice_Families_and_Real_Estate Venice 1741/1808: Families and Real Estate]'''
|-
| 4
| Irina Serenko
| Yinghui Jiang
|
| '''[http://fdh.epfl.ch/index.php/Switzerland_Road_extraction_from_historical_maps Switzerland:Road extraction from historical maps]'''
|-
| 5
| Yuxiao Li
| Yuhan Bi
| Sruti Bhattacharjee
| '''[http://fdh.epfl.ch/index.php/Venice2020_Building_Heights_Detection Venice 2020: Detection of height of building based on drone’s videos]'''
|-
| 6
| Yurui Zhu
| Siyi Wang
|
| '''[http://fdh.epfl.ch/index.php/3DVeniceBirdges Venice 1808: Procedural Modelling of Bridges]'''
|-
| 7
| Loïc Rochat
| Davide Chemin
|
| '''[http://fdh.epfl.ch/index.php/3DVeniceChurches Venice 1808: Procedural Modelling of Well-heads]'''
|-
| 8
| Marin Piguet
| Noé Durandard
|
| '''[http://fdh.epfl.ch/index.php/Alignment_of_XIXth_century_cadasters Alignment of XIXth century cadasters]'''
|}
 
= Projects 2020 =
 
 
== Groups ==
 
{|class="wikitable"
! Group number
! First student name
! Second student name
! Third student name
! Project name 1
! Project name 2
! Project name 3 (Optional)
|-
|Group 1
|Beatriz Borges
|Lilia Ellouz
|André Ghattas
| [[VenBioGen]]: Biography generation  ***
| Chat-a-ghost
|
|-
|Group 2
|Elisa Michelet
|Gonxhe Idrizi
|
| [[Opera Rolandi archive]]": Opera Rolandi Archive ***
| Score Digitization
| Tassini’s toponomastics
|-
|Group 3
|Ravinithesh Annapureddy
|Maxime Jan
|
|[[Terzani online museum]]
|Immersive educational game
|Instavenice (Instagram of Past Venice)
|-
|Group 4
|Mariko Makhmutova
|Alex Rusnak
|Anna Sophia Bauer
|[[Photorealistic rendering of painting + Venice Underwater]] *** (GAN)
|Using historical artifacts to trace trade routes
|Venice underwater
|-
|Group 5
|Jeremy Mion
|Bastien Beuchat
|
|[[Deciphering Venetian handwriting]] ***
|Venice 1808 immersive experience
|Art forgery detector
|-
|Group 6
|Michał Bień
|Graziano Conti Rossini
|
| [[WikiBio]]: Biography generation ***
|Facebook for Venetian Artists
|
|-
|Group 7
|Junhong Li
|Yuanhui Lin
|Zijun Cui
|[[Paintings / Photos geolocalisation]] ***
|Day matters in Venice
|
|-
|Group 8
|Hangqian Li
|Linyida Zhang
|
|[[Procedural Venice]] ***
|Modern people in Venice(GAN)
|
|-
|Group 9
|Maximilian Ben Ali
|Eva Laini
|
|[[Austrian cadastral map]] ***
|
|
|-
|Group 10
|Ludovica Schaerf
|Aurel Ruben
|Harshdeep
|[[Rolandi Librettos]] ***
|Terzani Photo collection
|
|}
 
== Guidelines ==
 
=== For the first presentation (01.10) ===
(1) Choose one or two projects from the project list and invent another one.
 
(2) Form a group of 2-3 people and fill out the [[#Groups|table]].
 
(3) Prepare a short presentation (1-2 slides) for your projects.
 
==== Projects list ====
 
# Deciphering Venetian handwriting (Raphaël)
# Austrian cadastral map (Raphaël)
# Paintings / Photos geolocalisation (Frederic, Paul)
# Toponomastics (Federica)
# Sanudo’s Diaries (Federica)
# Venice 1100 and 1300 (Federica)
# San Giorgio 4D Cloud Points (Albane)
# Photogrammetric reconstruction of Venice based on Youtube movies (Albane)
# Procedural Venice (Fabrice, Paul)
# Photorealistic rendering of painting (GAN) (Frederic, Raphael)
# Coherence Fact-checking (Frederic)
# Biography generation (Frederic)
# Venice IIIF Annotator (Frederic)
# Homologous navigation(Frederic)
# Open Data from Venice 1808 (Raphaël)
# Venice 1808 immersive experience (Fabrice)
# Quora for Venitian history (Frederic)
# Venice underwater (Frederic) - Connection with on-going project in Venice about Aqua Alta
# Unbuilt Venice (Fabrice)
# Opera Rolandi archive (Frederic) - Connection with on-going project at Cini Foundation
# Terzani archive (Frederic)  - Connection with on-going project at Cini Foundation
 
The name of the different supervisor for the projects are:
Albane Descombes
Fabrice Berger
Federica Pardini
Raphaël Barman
Frederic Kaplan
 
You can find email addresses on [https://people.epfl.ch EPFL directory]. Plus put the Frederic Kaplan, Paul Guhennec and Raphael Barman in copy for the first exchanges
 
==== Available ressources ====
 
===== Data and sources =====
Here is a small list of the available resources for your projects. Do not hesitate to send an e-mail if you have questions about any of them.
 
* Data for training a handwritten text recognition (HTR) system.
* Data for training a geometries extractor from the cadastral map.
* High-resolution images of Venice's Austrian cadastral map.
* Complete transcription of the registers (Sommarioni) of the 1808 Venice's Napoleonic cadaster (parcel number, owner name, typology of parcels).
* Complete extraction the 1808 Venice's Napoleonic cadastral map.
* Partial extraction from the proto-cadaster (Catastici) of 1740 Venice. (Owner name, tenant name, typology of parcel, link with 1808 cadaster).
* Large photo and painting archive from the Cini Foundation.
* Numerization and/or physical book of Tassini’s toponomastics, Sanudo’s Diaries, Venezia romanica.
* 4D point cloud for San Giorgio.
* 3D procedural model (in progress) of Venice.
* Data from the Garzoni project c.f. [https://garzoni.hypotheses.org here] for more information.
 
===== DHLAB skills =====
The current main skills of the laboratory are:
* Architecture and History of Venice.
* Cadaster information extraction and HTR.
* Photogrammetric reconstruction.
* 3D modeling.
 
You can however still do a project outside of these fields of expertise. We will do our best to help you.
 
 
= Projects 2019 =
 
== Projects ==
 
{|class="wikitable"
! style="text-align:center;"|Group number
! First student name
! Second student name
! Third student name
! Project page
|-
|1
|Guilhem Sicard
|Todor Manev
| -
|[[Love2Late]]
|-
|2
|Giacomo Alliata
|Andrea Scalisi
| -
|[[Influencers of the past]]
|-
|3
|Leonore Guillain
|Haeeun Kim
|Liamarcia Bifano
|[[Humans of Paris 1900]]
|-
|4
|Arthur Parmentier
|Bertil Wicht
| -
|[[Quartiers Livres / Booking Paris]]
|-
|5
|Robin Szymczak
|Cédric Tomasini
|
|[[Virtual Louvre]]
|-
|}
 
== Guidelines ==
 
* (1) Select an image series on Paris
* (2) Make a sketch (drawing + textual description) of a final interface
* (3) Extract the maximum information out of it (Train a segmenter, Train a handwritten recognition system)
* (4) Export this information in other places in the Web or Build a website with specific services
 
As the focus of this year is Paris and interaction between projects is wanted, projects should ideally make use of a common system. Example of a common system is street names, addresses or in some cases name of people.
 
== Image collections ==
 
* Gallica offers a large collection of images of a variety of types. The most useful entry point is the [https://gallica.bnf.fr/html/und/france/paris?mode=desktop Paris collection].
* The [https://bibliotheque-numerique.inha.fr/en/ ''Bibliothèque numérique'' of INHA] offers a lot of resources in link with art history.
* The [http://archives.paris.fr/r/123/archives-numerisees/ ''Archives de Paris''] have many administrative documents as well as a large photography collection.
 
== Specific examples ==
* Photographies of Eugène Atget documents the forgotten Paris of 19th century. They can be found on [https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&version=1.2&startRecord=0&maximumRecords=50&page=1&query=(bibliotheque%20adj%20%22Biblioth%C3%A8que%20nationale%20de%20France%22)%20and%20(dc.creator%20all%20%22eug%C3%A8ne%20atget%22%20or%20dc.contributor%20all%20%22eug%C3%A8ne%20atget%22)%20and%20(dc.type%20all%20%22image%22)%20sortby%20dc.date%2Fsort.ascending Gallica] and on [https://bibliotheque-numerique.inha.fr/en/collection/?&refine%5BCategories%5D%5B%5D=Photographies&refine%5BCategories%5D%5B%5D=Photographies$$$Fonds%20Eug%C3%A8ne%20Atget%20(1857-1927) INHA library].
* Architectural photograhy collection of [http://archives.paris.fr/f/photos/mosaique/?&crit1=7&v_7_1=Architecture Archives de Paris].
* ''Grand monde de Paris'' , address book of famous people of Paris, found on [https://gallica.bnf.fr/ark:/12148/bpt6k936097z/f11.image Gallica].
* Name and reservations for some theaters of Paris found on [https://gallica.bnf.fr/ark:/12148/bpt6k205233j/f27.image Gallica] and plan of the theatre room also found on [https://gallica.bnf.fr/ark:/12148/bpt6k11808753/f93.image Gallica].
 
== Resources ==
=== Tips and tricks ===
* Be careful about encoding when using OCR from Gallica. The XML tag says that the encoding is ISO-8859-1, but the actual encoding is UTF-8. Force the encoding when opening the file in python. Additionally, you can modify the tag directly inside the XML so that it opens correctly in your favorite text editor.
* In python, if you launch a for loop with expensive operation, it can be nice to know how long each iteration takes and to have an idea of the general progress. This can be done using the [https://tqdm.github.io tqdm] package. When wrapping an iterable with it, a progress bar will appear, e.g. <code>for item in tqdm(list_of_items):</code>. Note this can also be used with pandas, c.f. the [https://tqdm.github.io/docs/tqdm/#pandas documentation].
 
=== Data extraction tools ===
* [[Gallica wrapper]] to download documents, images and OCR from Gallica directly from Python.
* [https://transkribus.eu Transkribus] to make your own OCR. It also offers a [https://transkribus.eu/wiki/index.php/REST_Interface REST API] if you prefer.
* [http://www.robots.ox.ac.uk/~vgg/software/via/ VGG Image annotator] (VIA) to annotate your documents.
* OCR and HTR available for documents who do not have it. (Caveat for HTR, the data series should be uniform and needs to be annotated by hand).
* [https://dhlab-epfl.github.io/dhSegment/ dhSegment] for segmenting visually distinct part of documents.
 
=== Databases/APIs ===
* [http://api.bnf.fr/api-document-de-gallica Gallica document API], [http://api.bnf.fr/api-iiif-de-recuperation-des-images-de-gallica Gallica image extraction API], [http://api.bnf.fr/wrapper-python-pour-les-api-gallica python Gallica API wrapper] and [http://api.bnf.fr/extracteur-python-de-corpus-de-periodiques python periodical corpus extractor]
* INHA offers an [https://www.openarchives.org/pmh/ OAI-PMH] API, see [https://bibliotheque-numerique.inha.fr/oai/?verb=ListSets here] for an entry point.
* [https://data.bnf.fr/ Data BNF], [https://www.wikidata.org WikiData] for general-purpose knowledge base.
* [[Lists of addresses of Paris]]
 
== Sketches ==
 
{|class="wikitable"
! First student name
! Second student name
! Third student name
! Sketch 1
! Sketch 2
! Sketch 3
|-
|Guilhem Sicard
|Todor Manev
| -
|[[Sketch of Love2Late]]
|[[Sketch of The 1897 directory of businesses]]
|
|-
|Giacomo Alliata
|Andrea Scalisi
| -
|[[Sketch of Influencers of the past]]
|[[Sketch of Artwork origins in Paris]]
|-
|Leonore Guillain
|Haeeun Kim
|Liamarcia Bifano
|[[Sketch of Humans of Paris 1900]]
|[[Sketch of ImmoSearch Paris 1900]]
|[[Sketch of Job Search Paris 1900]]
|-
|Arthur Parmentier
|Bertil Wicht
| -
|[[Sketch of Book editing in Paris 1847]]
|[[Sketch of Trip advisor Paris]]
|[[Sketch of Bike from Paris]]
|-
|Robin Szymczak
|Cédric Tomasini
|
|[[Sketch of Virtual Louvre]]
|[[Sketch of Diving into the Opera]]
|[[Sketch of Article in context]]
|-
|}
 
 
 
= Projects 2018 =
 
* (1) Choose a Map on Gallica https://gallica.bnf.fr/accueil/?mode=desktop
* (2) Extract the maximum information out of it (Train a segmenter, Train a handwritten recognition system)
* (3) Export this information in other places in the Web or Build a website with specific services
 
== Projects websites ==
 
* Shortest-Path Route Extraction From City Map
* [https://annfr542.github.io/fdhProject/ Train schedules]
* [http://174.138.13.133:1909/ Paris 1909 TripAdvisor]
* [https://sherstiuk.github.io/beijing/ A century in Beijing]
* [https://remipetitpierre.wixsite.com/coalsupply Coal supply in the German Empire]
* [http://valentinebernasconi.ch/paris_metropolitan/index.html Paris Metropolitan, an evolution]
 
== [http://fdh.epfl.ch/index.php/Shortest-Path_Route_Extraction_From_City_Map Shortest-Path Route Extraction From City Map] ==
 
The goal of this project is to give people an idea of what it might have been like to find one's way around cities of the past, by building a navigation tool not unlike Google Maps or similar services - except it is for the past. The application will create and display the shortest path between user-selected start and end points, using information extracted from an old city map. The finished application will be made available via a [http://www.dhproject.cf/ web-interface].
 
On a technical level, the approach is based on extracting the road network from the map, representing it as an undirected planar graph, then applying Dijkstra's algorithm to solve the routing problem.
 
Members: Jonathan and Florian
 
== [[Train schedules]]  ==
 
This project aims to observe how one could travel between France, Switzerland and the North of Italy back in the middle of the 19th century. From the journey paths, schedules and prices, one will realize how easy it is now to travel through Europe and how big the impact of the evolution of both the railway network and the technology has been. These data will be extracted from a [https://gallica.bnf.fr/ark:/12148/btv1b8441301x.r=suisse%20train?rk=21459%3B2 document]  from 1858and computed in a [https://annfr542.github.io/fdhProject/ CFF-like website] so that one can put oneself in the shoes of a railway user of the 1850's.
 
Members: Anna Fredrikson and Olivier Dietrich
 
==[http://fdh.epfl.ch/index.php/Paris_1909_TripAdvisor Paris 1909 TripAdvisor]==
 
The project intends to recreate the cultural geography of Paris in the Belle Époque, immersing a user into the world of cabarets, balls, theatres, the universe described by Zola and Proust, and painted by Renoir and Toulouse-Lautrec. In order to give a new perspective on how this legendary world was actually structured and perceived, we are going to digitize the authentic [https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000856497/v0001.simple.selectedTab=record Plan des plaisirs et attractions de Paris] created in 1909 and augment it with the pieces of evidence and descriptions from the [https://gallica.bnf.fr/ark:/12148/bpt6k124915s Guide dés plaisirs à Paris]: timetables, advice and guidance on what to wear, what to say, and where to go.
 
- Alina, Maryam, and Paola
 
== [[A century in Beijing]] ==
Using this [https://gallica.bnf.fr/ark:/12148/btv1b52508921s/f1.item.r=Pekin%20carte.zoom map by France's Service Géographique de l'Armée] from 1900 we will follow the evolution of the urban landscape of the central part of China's capital over the last hundred years.
 
=== The planned goals ===
* To align maps from different time periods and see how the landscape changed. The town's straightforward rectangular planning will allow us to make matches more easily.
* The map has a rich legend with toponymic information in French and the dated French system of transliteration of Mandarin. The plan is to extract and match these place names with their [https://www.gujianw.com/2522.html modern counterparts].
* Add the old pictures of significant buildings that are no longer there if it's possible to find them.
 
Members: Jimin and Anton
 
== [[Coal supply in the German Empire]] ==
=== Main ideas ===
 
* To study the coal supply and demand of the German Empire for the year 1881.
* Interactive visualization of Germany's main coal production and consumption centers.
* Dynamic visualization of coal transport flows according to the different mining basins and transport routes.
* Differentiating the production and consumption centers from the transport hubs.
* Creation of a website to present the results.
 
Members : Axel Matthey and Rémi Petitpierre
 
== [[ Paris Metropolitan, an evolution ]] ==
 
=== Definition of the project ===
 
This group will analyze the evolution of the Paris Metropolitan system from its inception. The group will look at two maps of the planning of the metro, from the definition of the routes to the addition of stations, a first map from 1908 of the actual metro after its construction in 1990, a second map from 1915, with already visible impacts of the first war, and a third map from 1950, a more contemporain look at the metro as we know it today. The goal is to analyze how different areas of major cultural attractions evolved around or hand in hand with the metro stations and the overall Paris metro system - basically answering the chicken and the egg question, and how the metro was impacted by catastrophic events such as wars.
 
=== Selected maps ===
 
The different maps selected for the project are the following:
 
[https://gallica.bnf.fr/ark:/12148/btv1b53021096k/f1.item.r=metro%20paris Plan de Paris, avec le tracé du chemin de fer métropolitain (projet de l'administration) et les différentes lignes d'omnibus et de tramways, 1882]<br>
[https://gallica.bnf.fr/ark:/12148/btv1b530710210/f1.item.r=metro%20paris Plan de Paris [indiquant les lignes projetées du chemin de fer métropolitain en souterrain, tranchée, Viaduc, 1895]<br>
[https://gallica.bnf.fr/ark:/12148/btv1b530620268.r=metro%20paris?rk=128756;0 Paris, chemin de fer métropolitain ; lignes en exploitation, 1908]<br>
[https://gallica.bnf.fr/ark:/12148/btv1b84460498/f1.item.r=metro%20paris Paris Nouveau plan de Paris avec toutes les lignes du Métropolitain et du Nord-Sud, 1915]<br>
[https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000856404/v0001.simple.selectedTab=record Paris. Plan d'ensemble par arrondissements. Métropolitain : [vers 1950]]<br>
 
Members: Evgeniy Chervonenko and Valentine Bernasconi
 
= Projects 2017 =
 
All the projects are pieces of a larger puzzle.
The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.
 
The platform is called [[ClioWire]]
 
== ClioWire: Platform management and development ==
 
This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses.
The initial code base is Mastodon.
 
The group will write bots for rewritting pulses and progressively converging towards articulation/datafication of the pulses.
 
Knowledge required : Python, Javascript, basic linux administration.
 
Resp. Vincent and Orlin
 
- Albane
- Cédric
<br />
[[Platform management and development : State of art and Bibliography]]
 
[[Platform management and development : methodology]]
 
[[Platform management and development : Quantitative analysis of performance]]
 
GitHub page of the project : [https://github.com/epflProjects/cliowire-bots]
 
== Secondary sources ==
 
The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses.
This should consiste a de facto set of relevant information taking a large base of Venetian documents.
 
Resp. Giovanni / Matteo
 
- Hakim
- Marion
 
[[Named Entity Recognition]]
 
GitHub page of the project : [https://github.com/inverniz/FDH_SecondarySources]
 
== [[Primary sources]] ==
 
This group will look for named entities in digiitized manuscript and post pulses about these mentions.
* The group will use Wordspotting methods  based on commercial algorithm. During the project, the group will have to set up a dedicated pipeline for indexing and searching the document digitized in the Venice Time Machine project and other primary sources using the software component provided.
* The group will have to search for list of names or regular expressions. A method based on predefined list will be compared with a recursive method based on the results provided by the Wordspotting components.
*Two types of Pulses will be produced : (a) "Mention of Francesco Raspi in document X" (b) "Franseco Raspi and Battista Nanni linked (document Y)"
*The creation of simple web Front end to test the Wordspotting algorithm would help assessing the quality of the method
 
Supervisor : Sofia
 
Skills : Java, simple Linux administration
 
- Raphael
- Mathieu
 
[[Primary sources]]
 
== Image banks ==
 
The goal is to transform the metadata of CINI which have been OCRed into pulses.
One challenge is to deal with OCR errors and possible disambiguation.
 
Supervision: Lia
 
== Newspaper, Wikipedia, Semantic Web ==
''The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities.
These sentences should be posted as pulses.
 
The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.
 
In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)''
 
Resp. Maud
 
Skills: Python or Java
 
- Laurene and Santiago
 
<br />
[[Newspaper, Wikipedia, Semantic Web : State of art and Bibliography]]

Latest revision as of 09:13, 13 November 2024

Projects 2024

Supervisor contact:

  • Alexander Michael Rusnak (alexander.rusnak@epfl.ch)
  • Tommy Bruzzese (tommy.bruzzese@epfl.ch)
  • Tristan Karch (tristan.karch@epfl.ch)

Student Groups

Group Number First Student Second Student Third Student Projects
1 Eglantine Vialaneix Lisa Nja Nathanael Lambert Widows in Venice
2 Chiara Delvecchio Haotian Fang Vittoria Meroni Sanudo's Diary
3 Sahil Singhvi Owen Yu-Wei Ng Rasmus Makiniemi Data Ingestion of Guide Commericiale
4 Nastaran Hashemisanjani Fawzia Zeitoun Bich Ngoc Doan Generation of Textual Description
5 Ching-Chi Chou André Da Gloria Santiago Yi Ren Traghetti in Venice
6 Zhichen Fang Ruyin Feng Generation of Textual Description
7 Yasmine Kroknes-Gomez Oscar Goudet Weilun Xu Reconstruction of Partial Facades
8 Kaile Chu Jiaqi Ding Maitri Anil Dedhia Sanudo's Diary
9 Shu Yang Lisa Vind Clay Cassius Parker Foye Completing Facades
10 Xin Huang Yutaka Osaki Ziang Guo LLM-generated Visualization

Projects 2023

The name of the different supervisor for the projects are:

  • Beatrice Vaienti (beatrice.vaienti@epfl.ch)
  • Sven Najem-Meyer (sven.najem-meyer@epfl.ch)
  • Alexander Michael Rusnak (alexander.rusnak@epfl.ch)

Student groups

Group Number First Student Second Student Third Student Fourth Student Projects
1 Arundhati BALASUBRAMANIAM Carolina MARUGAN RUBIO Davide ROMANO Ethical Guidance of LLMs
2 Jingbang LIU Yanzi LIU Changkong ZHOU Europeana: mapping postcards
3 Cindy TANG Xi LEI Yiren CAO AI Ethics in Large Language Models 🤖 commit history can be found here
4 ChunTzu CHANG Yunlong DONG Tz-Ching YU Digitizing a Chinese Cookbook: Yinshanzhengyao 飲膳正要
5 Zimu ZHAO Xingyu PAN Tracking a Historic Market Crash through Articles
6 Haotian WU Diego MARQUEZ Yiwei LIU Maximiliano CARVAJAL Extending Text2Image Models to Accept Multi-Modal Conditions by Encoding to the CLIP Latent Space
7 Tianhao DAI Kaede JOHNSON Mengjie ZOU Extracting Toponyms from Maps of Jerusalem
8 Daniele BELFIORE Jiaming JIANG Nil ILBA Spatialising Sarah Barclay Johnson's travelogue around Jerusalem (1858)
9 Romane CLERC Shayan KHAJEHNOURI FashionGAN

Projects 2022

Projects list

  1. Venice - Numeri Civici Linking (House Numbers). (Paul)
  2. Venice - Linked Data for Monuments and Image Gallery. (Paul)
  3. Venice - 1100-1300 Morphology and Events (Paul, Didier)
  4. Venice - Sea Level Rise Scenarios (Beatrice, Paul)
  5. Jerusalem - Procedural Modelling of Muqarnas (Beatrice)
  6. Jerusalem - 1940: From an Atlas to a GIS Database (Beatrice, Paul)
  7. Jerusalem - Locating Colonies and Neighbourhoods (Beatrice, Sven, Paul)
  8. Paris - Address Books of the Past (Paul, Didier)
  9. France/Italy - Exploring Historical Cookbooks (Beatrice, Paul, Sven)
  10. Switzerland - 3D Modelling of Glaciers from 120,000BC to Today (Didier, Beatrice)
  11. Switzerland - 1874 Mapping of Glaciers using DeepLearning (Didier, Beatrice)
  12. Europeana - A New Spatiotemporal Search Engine (Didier, Sven)
  13. Europeana - Mapping Postcards (Didier, Sven)
  14. Armenia - Historical Epigraphy (Hamest)

The name of the different supervisor for the projects are:

  • Didier Dupertuis (didier.dupertuis@epfl.ch)
  • Beatrice Vaienti (beatrice.vaienti@epfl.ch)
  • Paul Guhennec (paul.guhennec@epfl.ch)
  • Sven Najem-Meyer (sven.najem-meyer@epfl.ch)
  • Hamest Tamrazyan (hamest.tamrazyan@epfl.ch)

You can however still do a project outside of the DHLAB's fields of expertise. We will do our best to help you.

Guidelines

For the first presentation (06.10):

  • (1) Choose two or three projects from the project list. You can invent your own if you wish.
  • (2) Form a group of 2/3 people and fill out the table.
  • (3) Prepare a short presentation (1-2 slides) for your projects.

Student groups

Group Number First Student Second Student Third Student Project
1 Lea Marxen Ben Kriesel Paris: address book of the past
2 Weier Liu Linying Yao Jerusalem: locating the colonies and neighborhoods
3 Ghali Chraïbi Su Xiaotian France: Exploring Historical Cookbooks
4 Mariella Daghfal Margaux Zwierski Procedural modeling of Muqarnas
5 Xingchen Li Xinyi Ding Ke Li Europeana: A New Spatiotemporal Search Engine
6 Zhiye Wang Aibin Yu Detection of glacier change using dhSegment from the Siegfried map from 1874

Projects 2021

Student groups

Group Number First Student Second Student Third Student Project
1 Yichen Wang Amina Matt Switzerland and the Transatlantic Slavery
2 Junzhe Tang Zhipeng Mao Jerusalem 1840-1949 Historical Maps Georeferencing
3 Veniamin Veselovsky Hannah Casey Venice 1741/1808: Families and Real Estate
4 Irina Serenko Yinghui Jiang Switzerland:Road extraction from historical maps
5 Yuxiao Li Yuhan Bi Sruti Bhattacharjee Venice 2020: Detection of height of building based on drone’s videos
6 Yurui Zhu Siyi Wang Venice 1808: Procedural Modelling of Bridges
7 Loïc Rochat Davide Chemin Venice 1808: Procedural Modelling of Well-heads
8 Marin Piguet Noé Durandard Alignment of XIXth century cadasters

Projects 2020

Groups

Group number First student name Second student name Third student name Project name 1 Project name 2 Project name 3 (Optional)
Group 1 Beatriz Borges Lilia Ellouz André Ghattas VenBioGen: Biography generation *** Chat-a-ghost
Group 2 Elisa Michelet Gonxhe Idrizi Opera Rolandi archive": Opera Rolandi Archive *** Score Digitization Tassini’s toponomastics
Group 3 Ravinithesh Annapureddy Maxime Jan Terzani online museum Immersive educational game Instavenice (Instagram of Past Venice)
Group 4 Mariko Makhmutova Alex Rusnak Anna Sophia Bauer Photorealistic rendering of painting + Venice Underwater *** (GAN) Using historical artifacts to trace trade routes Venice underwater
Group 5 Jeremy Mion Bastien Beuchat Deciphering Venetian handwriting *** Venice 1808 immersive experience Art forgery detector
Group 6 Michał Bień Graziano Conti Rossini WikiBio: Biography generation *** Facebook for Venetian Artists
Group 7 Junhong Li Yuanhui Lin Zijun Cui Paintings / Photos geolocalisation *** Day matters in Venice
Group 8 Hangqian Li Linyida Zhang Procedural Venice *** Modern people in Venice(GAN)
Group 9 Maximilian Ben Ali Eva Laini Austrian cadastral map ***
Group 10 Ludovica Schaerf Aurel Ruben Harshdeep Rolandi Librettos *** Terzani Photo collection

Guidelines

For the first presentation (01.10)

(1) Choose one or two projects from the project list and invent another one.

(2) Form a group of 2-3 people and fill out the table.

(3) Prepare a short presentation (1-2 slides) for your projects.

Projects list

  1. Deciphering Venetian handwriting (Raphaël)
  2. Austrian cadastral map (Raphaël)
  3. Paintings / Photos geolocalisation (Frederic, Paul)
  4. Toponomastics (Federica)
  5. Sanudo’s Diaries (Federica)
  6. Venice 1100 and 1300 (Federica)
  7. San Giorgio 4D Cloud Points (Albane)
  8. Photogrammetric reconstruction of Venice based on Youtube movies (Albane)
  9. Procedural Venice (Fabrice, Paul)
  10. Photorealistic rendering of painting (GAN) (Frederic, Raphael)
  11. Coherence Fact-checking (Frederic)
  12. Biography generation (Frederic)
  13. Venice IIIF Annotator (Frederic)
  14. Homologous navigation(Frederic)
  15. Open Data from Venice 1808 (Raphaël)
  16. Venice 1808 immersive experience (Fabrice)
  17. Quora for Venitian history (Frederic)
  18. Venice underwater (Frederic) - Connection with on-going project in Venice about Aqua Alta
  19. Unbuilt Venice (Fabrice)
  20. Opera Rolandi archive (Frederic) - Connection with on-going project at Cini Foundation
  21. Terzani archive (Frederic) - Connection with on-going project at Cini Foundation

The name of the different supervisor for the projects are: Albane Descombes Fabrice Berger Federica Pardini Raphaël Barman Frederic Kaplan

You can find email addresses on EPFL directory. Plus put the Frederic Kaplan, Paul Guhennec and Raphael Barman in copy for the first exchanges

Available ressources

Data and sources

Here is a small list of the available resources for your projects. Do not hesitate to send an e-mail if you have questions about any of them.

  • Data for training a handwritten text recognition (HTR) system.
  • Data for training a geometries extractor from the cadastral map.
  • High-resolution images of Venice's Austrian cadastral map.
  • Complete transcription of the registers (Sommarioni) of the 1808 Venice's Napoleonic cadaster (parcel number, owner name, typology of parcels).
  • Complete extraction the 1808 Venice's Napoleonic cadastral map.
  • Partial extraction from the proto-cadaster (Catastici) of 1740 Venice. (Owner name, tenant name, typology of parcel, link with 1808 cadaster).
  • Large photo and painting archive from the Cini Foundation.
  • Numerization and/or physical book of Tassini’s toponomastics, Sanudo’s Diaries, Venezia romanica.
  • 4D point cloud for San Giorgio.
  • 3D procedural model (in progress) of Venice.
  • Data from the Garzoni project c.f. here for more information.
DHLAB skills

The current main skills of the laboratory are:

  • Architecture and History of Venice.
  • Cadaster information extraction and HTR.
  • Photogrammetric reconstruction.
  • 3D modeling.

You can however still do a project outside of these fields of expertise. We will do our best to help you.


Projects 2019

Projects

Group number First student name Second student name Third student name Project page
1 Guilhem Sicard Todor Manev - Love2Late
2 Giacomo Alliata Andrea Scalisi - Influencers of the past
3 Leonore Guillain Haeeun Kim Liamarcia Bifano Humans of Paris 1900
4 Arthur Parmentier Bertil Wicht - Quartiers Livres / Booking Paris
5 Robin Szymczak Cédric Tomasini Virtual Louvre

Guidelines

  • (1) Select an image series on Paris
  • (2) Make a sketch (drawing + textual description) of a final interface
  • (3) Extract the maximum information out of it (Train a segmenter, Train a handwritten recognition system)
  • (4) Export this information in other places in the Web or Build a website with specific services

As the focus of this year is Paris and interaction between projects is wanted, projects should ideally make use of a common system. Example of a common system is street names, addresses or in some cases name of people.

Image collections

Specific examples

  • Photographies of Eugène Atget documents the forgotten Paris of 19th century. They can be found on Gallica and on INHA library.
  • Architectural photograhy collection of Archives de Paris.
  • Grand monde de Paris , address book of famous people of Paris, found on Gallica.
  • Name and reservations for some theaters of Paris found on Gallica and plan of the theatre room also found on Gallica.

Resources

Tips and tricks

  • Be careful about encoding when using OCR from Gallica. The XML tag says that the encoding is ISO-8859-1, but the actual encoding is UTF-8. Force the encoding when opening the file in python. Additionally, you can modify the tag directly inside the XML so that it opens correctly in your favorite text editor.
  • In python, if you launch a for loop with expensive operation, it can be nice to know how long each iteration takes and to have an idea of the general progress. This can be done using the tqdm package. When wrapping an iterable with it, a progress bar will appear, e.g. for item in tqdm(list_of_items):. Note this can also be used with pandas, c.f. the documentation.

Data extraction tools

  • Gallica wrapper to download documents, images and OCR from Gallica directly from Python.
  • Transkribus to make your own OCR. It also offers a REST API if you prefer.
  • VGG Image annotator (VIA) to annotate your documents.
  • OCR and HTR available for documents who do not have it. (Caveat for HTR, the data series should be uniform and needs to be annotated by hand).
  • dhSegment for segmenting visually distinct part of documents.

Databases/APIs

Sketches

First student name Second student name Third student name Sketch 1 Sketch 2 Sketch 3
Guilhem Sicard Todor Manev - Sketch of Love2Late Sketch of The 1897 directory of businesses
Giacomo Alliata Andrea Scalisi - Sketch of Influencers of the past Sketch of Artwork origins in Paris
Leonore Guillain Haeeun Kim Liamarcia Bifano Sketch of Humans of Paris 1900 Sketch of ImmoSearch Paris 1900 Sketch of Job Search Paris 1900
Arthur Parmentier Bertil Wicht - Sketch of Book editing in Paris 1847 Sketch of Trip advisor Paris Sketch of Bike from Paris
Robin Szymczak Cédric Tomasini Sketch of Virtual Louvre Sketch of Diving into the Opera Sketch of Article in context


Projects 2018

  • (1) Choose a Map on Gallica https://gallica.bnf.fr/accueil/?mode=desktop
  • (2) Extract the maximum information out of it (Train a segmenter, Train a handwritten recognition system)
  • (3) Export this information in other places in the Web or Build a website with specific services

Projects websites

Shortest-Path Route Extraction From City Map

The goal of this project is to give people an idea of what it might have been like to find one's way around cities of the past, by building a navigation tool not unlike Google Maps or similar services - except it is for the past. The application will create and display the shortest path between user-selected start and end points, using information extracted from an old city map. The finished application will be made available via a web-interface.

On a technical level, the approach is based on extracting the road network from the map, representing it as an undirected planar graph, then applying Dijkstra's algorithm to solve the routing problem.

Members: Jonathan and Florian

Train schedules

This project aims to observe how one could travel between France, Switzerland and the North of Italy back in the middle of the 19th century. From the journey paths, schedules and prices, one will realize how easy it is now to travel through Europe and how big the impact of the evolution of both the railway network and the technology has been. These data will be extracted from a document from 1858and computed in a CFF-like website so that one can put oneself in the shoes of a railway user of the 1850's.

Members: Anna Fredrikson and Olivier Dietrich

Paris 1909 TripAdvisor

The project intends to recreate the cultural geography of Paris in the Belle Époque, immersing a user into the world of cabarets, balls, theatres, the universe described by Zola and Proust, and painted by Renoir and Toulouse-Lautrec. In order to give a new perspective on how this legendary world was actually structured and perceived, we are going to digitize the authentic Plan des plaisirs et attractions de Paris created in 1909 and augment it with the pieces of evidence and descriptions from the Guide dés plaisirs à Paris: timetables, advice and guidance on what to wear, what to say, and where to go.

- Alina, Maryam, and Paola

A century in Beijing

Using this map by France's Service Géographique de l'Armée from 1900 we will follow the evolution of the urban landscape of the central part of China's capital over the last hundred years.

The planned goals

  • To align maps from different time periods and see how the landscape changed. The town's straightforward rectangular planning will allow us to make matches more easily.
  • The map has a rich legend with toponymic information in French and the dated French system of transliteration of Mandarin. The plan is to extract and match these place names with their modern counterparts.
  • Add the old pictures of significant buildings that are no longer there if it's possible to find them.

Members: Jimin and Anton

Coal supply in the German Empire

Main ideas

  • To study the coal supply and demand of the German Empire for the year 1881.
  • Interactive visualization of Germany's main coal production and consumption centers.
  • Dynamic visualization of coal transport flows according to the different mining basins and transport routes.
  • Differentiating the production and consumption centers from the transport hubs.
  • Creation of a website to present the results.

Members : Axel Matthey and Rémi Petitpierre

Paris Metropolitan, an evolution

Definition of the project

This group will analyze the evolution of the Paris Metropolitan system from its inception. The group will look at two maps of the planning of the metro, from the definition of the routes to the addition of stations, a first map from 1908 of the actual metro after its construction in 1990, a second map from 1915, with already visible impacts of the first war, and a third map from 1950, a more contemporain look at the metro as we know it today. The goal is to analyze how different areas of major cultural attractions evolved around or hand in hand with the metro stations and the overall Paris metro system - basically answering the chicken and the egg question, and how the metro was impacted by catastrophic events such as wars.

Selected maps

The different maps selected for the project are the following:

Plan de Paris, avec le tracé du chemin de fer métropolitain (projet de l'administration) et les différentes lignes d'omnibus et de tramways, 1882
Plan de Paris [indiquant les lignes projetées du chemin de fer métropolitain en souterrain, tranchée, Viaduc, 1895
Paris, chemin de fer métropolitain ; lignes en exploitation, 1908
Paris Nouveau plan de Paris avec toutes les lignes du Métropolitain et du Nord-Sud, 1915
Paris. Plan d'ensemble par arrondissements. Métropolitain : [vers 1950]

Members: Evgeniy Chervonenko and Valentine Bernasconi

Projects 2017

All the projects are pieces of a larger puzzle. The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.

The platform is called ClioWire

ClioWire: Platform management and development

This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses. The initial code base is Mastodon.

The group will write bots for rewritting pulses and progressively converging towards articulation/datafication of the pulses.

Knowledge required : Python, Javascript, basic linux administration.

Resp. Vincent and Orlin

- Albane - Cédric
Platform management and development : State of art and Bibliography

Platform management and development : methodology

Platform management and development : Quantitative analysis of performance

GitHub page of the project : [1]

Secondary sources

The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses. This should consiste a de facto set of relevant information taking a large base of Venetian documents.

Resp. Giovanni / Matteo

- Hakim - Marion

Named Entity Recognition

GitHub page of the project : [2]

Primary sources

This group will look for named entities in digiitized manuscript and post pulses about these mentions.

  • The group will use Wordspotting methods based on commercial algorithm. During the project, the group will have to set up a dedicated pipeline for indexing and searching the document digitized in the Venice Time Machine project and other primary sources using the software component provided.
  • The group will have to search for list of names or regular expressions. A method based on predefined list will be compared with a recursive method based on the results provided by the Wordspotting components.
  • Two types of Pulses will be produced : (a) "Mention of Francesco Raspi in document X" (b) "Franseco Raspi and Battista Nanni linked (document Y)"
  • The creation of simple web Front end to test the Wordspotting algorithm would help assessing the quality of the method

Supervisor : Sofia

Skills : Java, simple Linux administration

- Raphael - Mathieu

Primary sources

Image banks

The goal is to transform the metadata of CINI which have been OCRed into pulses. One challenge is to deal with OCR errors and possible disambiguation.

Supervision: Lia

Newspaper, Wikipedia, Semantic Web

The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities. These sentences should be posted as pulses.

The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.

In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)

Resp. Maud

Skills: Python or Java

- Laurene and Santiago


Newspaper, Wikipedia, Semantic Web : State of art and Bibliography