Rolandi Librettos

From FDHwiki
Revision as of 11:52, 20 November 2020 by Aurel.mader (talk | contribs)
Jump to navigation Jump to search

Introduction

The Fondo Ulderico Rolandi is one of the greatest collections of librettos (def. Text booklet of an opera) in the world. Consisting of around 32’000 thousand librettos, spanning a time period from 16th to 20th century.


Project Planning

The draft of the project and the tasks for each week are assigned below:

Weekly working plan
Timeframe Task Completion
Week 4
07.10 Evaluating which APIs to use (IIIF)
Write a scraper to scrape IIIF manifests from the Libretto website
Week 5
14.10 Processing of images: apply Tessaract OCR
Extraction of dates and cleaned the dataset to create initial DataFrame
Week 6
21.10 Design and develop initial structure for the visualization (using dates data)
Running a sanity check on the initial DataFrame by hand
Matching list of cities extracted from OCR using search techniques
Week 7
28.10 Remove irrelevant backgrounds of images
Extract age and gender from images
Design data model
Extract tags, names, birth and death years out of metadata
Week 8
04.11 Get coordinates for each city and translation of city names
Extracted additional metadata (opera title, maestro) from the title of Libretto
Setting up map and slider in the visualization and order by year
Week 9
11.11 Adding metadata information in visualization by having information pane
Checking in with the Cini Foundation
Preparing the Wiki outline and the midterm presentation
Week 10
18.11 Compiling a list of musical theatres ⬜️
Getting better recall and precision on the city information
Identifying composers and getting a performer's information
Extracting corresponding information for the MediaWiki API for entities (theatres etc.)
Week 11
25.11 Integrate visualization's zoom functionality with the data pipeline to see intra-level info ⬜️
Linking similar entities together (which directors performed the same play in different cities?)
Week 12
02.12 Serving the website and do performance metrics for our data analysis ⬜️
Communicate and get feedback from the Cini Foundation
Continuously working on the report and the presentation
Week 13
09.12 Finishing off the project website and work, do a presentation on our results ⬜️


Just to show how to add images
Just to show how to add images

Historical Source

Methodology

Collecting data

Metadata extraction

Visualization

Quality assessment

Overall pipeline

Basic features extraction

Efficiency of algorithms

Results

Website

Links