Rolandi Librettos: Difference between revisions
Jump to navigation
Jump to search
Aurel.mader (talk | contribs) No edit summary |
|||
Line 2: | Line 2: | ||
= Introduction = | = Introduction = | ||
The [https://www.cini.it/en/collezioni/archives/theatre/ulderico-rolandi Fondo Ulderico Rolandi] is one of the greatest collections of [https://www.britannica.com/art/libretto librettos] (def. Text booklet of an opera) in the world. Consisting of around 32’000 thousand librettos, spanning a time period from 16th to 20th century. | |||
Revision as of 11:52, 20 November 2020
Introduction
The Fondo Ulderico Rolandi is one of the greatest collections of librettos (def. Text booklet of an opera) in the world. Consisting of around 32’000 thousand librettos, spanning a time period from 16th to 20th century.
Project Planning
The draft of the project and the tasks for each week are assigned below:
Timeframe | Task | Completion |
---|---|---|
Week 4 | ||
07.10 | Evaluating which APIs to use (IIIF) | ✅ |
Write a scraper to scrape IIIF manifests from the Libretto website | ||
Week 5 | ||
14.10 | Processing of images: apply Tessaract OCR | ✅ |
Extraction of dates and cleaned the dataset to create initial DataFrame | ||
Week 6 | ||
21.10 | Design and develop initial structure for the visualization (using dates data) | ✅ |
Running a sanity check on the initial DataFrame by hand | ||
Matching list of cities extracted from OCR using search techniques | ||
Week 7 | ||
28.10 | Remove irrelevant backgrounds of images | ✅ |
Extract age and gender from images | ||
Design data model | ||
Extract tags, names, birth and death years out of metadata | ||
Week 8 | ||
04.11 | Get coordinates for each city and translation of city names | ✅ |
Extracted additional metadata (opera title, maestro) from the title of Libretto | ||
Setting up map and slider in the visualization and order by year | ||
Week 9 | ||
11.11 | Adding metadata information in visualization by having information pane | ✅ |
Checking in with the Cini Foundation | ||
Preparing the Wiki outline and the midterm presentation | ||
Week 10 | ||
18.11 | Compiling a list of musical theatres | ⬜️ |
Getting better recall and precision on the city information | ||
Identifying composers and getting a performer's information | ||
Extracting corresponding information for the MediaWiki API for entities (theatres etc.) | ||
Week 11 | ||
25.11 | Integrate visualization's zoom functionality with the data pipeline to see intra-level info | ⬜️ |
Linking similar entities together (which directors performed the same play in different cities?) | ||
Week 12 | ||
02.12 | Serving the website and do performance metrics for our data analysis | ⬜️ |
Communicate and get feedback from the Cini Foundation | ||
Continuously working on the report and the presentation | ||
Week 13 | ||
09.12 | Finishing off the project website and work, do a presentation on our results | ⬜️ |