Main Page: Difference between revisions
Line 40: | Line 40: | ||
==== Week 3: Digitizations of Documents ==== | ==== Week 3: Digitizations of Documents ==== | ||
04.10 (2h) What is a document? What is a digital image? Exercise on Book Scanners typologies. Document digitisation as a problem of conversion of dimensions. Digitisation is logistic optimization. Alienation. Digitisation on demand. Fedorov's notion of optimal experiment. | 04.10 (2h) The Digitization Process and Pipelines. What is a document? What is a digital image? Exercise on Book Scanners typologies. Document digitisation as a problem of conversion of dimensions. Digitisation is logistic optimization. Alienation. Digitisation on demand. Fedorov's notion of optimal experiment. | ||
06.10 (2h) Pipeline for Written documents (Printed and Handwritten). Scanning techniques for books and documents. Principles of Transcription. Transcription tools. Canvas concept. One canvas multiple images, one image multiple canvas. Short introduction to IIIF. Named Entities, Semantic modelling,Topic and Document modelling. (2h) Presentation of the projects. Presentation of the main databases used in the course. Formation of the group. | 06.10 (2h) Pipeline for Written documents (Printed and Handwritten). Scanning techniques for books and documents. Principles of Transcription. Transcription tools. Canvas concept. One canvas multiple images, one image multiple canvas. Short introduction to IIIF. Named Entities, Semantic modelling,Topic and Document modelling. (2h) Presentation of the projects. Presentation of the main databases used in the course and ClioWire platform. Formation of the group. | ||
11.10 (2h) Pipeline for Maps. Vectorization. Alignment. Homologs Points. | 11.10 (2h) Pipeline for Maps. Vectorization. Alignment. Homologs Points. |
Revision as of 07:43, 4 October 2017
Welcome to the wiki of the course Foundation of Digital Humanities (DH-405).
Contact
Professor: Frédéric Kaplan
Assistants: Vincent Buntinx and Lia Costiner
Rooms: Wednesday (CMN1113) and Friday (CM1104)
Links
Summary
This course gives an introduction to the fundamental concepts and methods of the Digital Humanities, both from a theoretical and applied point of view. The course introduces the Digital Humanities circle of processing and interpretation, from data acquisition to new understandings and services. The first part of the course presents the technical pipelines for digitising, analysing and modelling written documents (printed and handwritten), maps, photographs and 3d objects and environments. The second part of the course details the principles of the most important algorithms for document processing (layout analysis, deep learning methods), knowledge modelling (semantic web, ontologies, graph databases) generative models and simulation (rule-based inference, deep learning based generation). The third part of the course focuses on platform management from the points of view of data, users and bots. Students will practise the skills they learn directly analysing and interpreting Cultural Datasets from ongoing large-scale research projects (Venice Time Machine, Swiss newspaper archives).
Plan
Introduction
Week 1 : Structural tensions in Digital Humanities
20.09 (2h) Introduction to the course and Digital Humanities, structure of the course. Introduction to Framapad with a simple exercise. Principle of collective note talking and use in the course. State of the Digital Humanities at EPFL, in Switzerland and in Europe. Structuring tensions 1: Digital Humanities, Digital Studies, Humanities Computing and Studies about Digital Culture. Digital Humanism vs. Digital Humanities. Why digital methods tend to dissolve traditional disciplinary frontiers. A focus on practice. Translation issues.
22.09 (2h) (a) Structuring tensions 2: Big Data Digital Humanities vs Small Data Digital Humanities. The 3 circles. Exercise on relationship between elements in Digital Culture schema. (2h) Practical session: Introduction to MediaWiki. Objective: Learning the basic syntax of MediaWiki. Get a first experience of collaborative editing. Learning to write from a neutral point of view. Creation of the articles by the student followed by peer-review by another student (enriching, completing references). Each student picks a DH person and DH concept, write a Wiki page for each (30 mn + 30 mn). Each student chooses another person and another concept among the ones already covered, enrich with complementary information and references (20 mn + 20 mn)
- DH historical figures: Roberto Busa, H.G. Wells, Paul Otlet, Emmanuel Le Roy Ladurie, Aby Warburg, Otto Bettmann, Tim Berners Lee, Jimmy Wales, Elisée Reclus, Albert Khan, Jules Maciet.
- DH concepts and related : Distant Reading, Regulated Representations, Pattern, Culturomics, Ubiquitous Scholarship, Gamification, Thick Mapping, Design fiction, New Aesthetics, Skeuomorphism, Digital Aura, Digital Heritage, Attention Economy, Folksonomy, Linguistic Capitalism, Open Access, Redocumentation, Open Hardware, Attention backbone, Opinion Mining, Topic Modelling, Gazetteer, Uberisation, Crowsifting, Copyleft, Onboarding.
(25.09 2pm : Experiment with Digital Art History interface INN116)
Week 2 : Patrimonial capitalism and common goods
27.09 (1h) Introduction to the DH circle linking the digitisation of sources, their processing, their analysis, visualisation and the creation of societal value (insight, culture) leading ultimately to the digitisation of new sources. Presentation of some sustainable DH circles (genealogy, image banks). Patrimonial capitalism and the risk of monopolistic companies. Parallelism with the race for sequencing the Human Genome. Introduction to the TIme Machine FET Flagship and mutualised infrastructure approach. (1h) General presentation of the Time Machine pipeline at the Datasquare / ArtLab pavillon.
29.09 (4h) Forum ArtTech (Rolex Learning Center). Mininig Big Data of the Past. Patrimonial capitalism and businesses opportunities. Examples of FamilySearch, myHeritage, Corbis.
Part I : Pipelines
Week 3: Digitizations of Documents
04.10 (2h) The Digitization Process and Pipelines. What is a document? What is a digital image? Exercise on Book Scanners typologies. Document digitisation as a problem of conversion of dimensions. Digitisation is logistic optimization. Alienation. Digitisation on demand. Fedorov's notion of optimal experiment.
06.10 (2h) Pipeline for Written documents (Printed and Handwritten). Scanning techniques for books and documents. Principles of Transcription. Transcription tools. Canvas concept. One canvas multiple images, one image multiple canvas. Short introduction to IIIF. Named Entities, Semantic modelling,Topic and Document modelling. (2h) Presentation of the projects. Presentation of the main databases used in the course and ClioWire platform. Formation of the group.
11.10 (2h) Pipeline for Maps. Vectorization. Alignment. Homologs Points.
13.10 (4h) (a) QGIS Hands On. Exercise on Venetian cadaster (Bastien, Isabella).
18.10 (2h) Pipeline for Artworks photographs. Image banks and phototarchives. Scanning techniques for photographs. Segmentation. Features detection. Detail search.
20.10 (4h) (a) Exercises with the Replica database and search engine (Lia, Isabella)
25.10 (2h) Pipeline for 3D spaces. Photogrammety. Diachronic realignment. Multiscale indexation.
27.10 (4h) (a) Photogrammetric tutorial (Nils)
Part II : Algorithms
01.11 (2h) Algorithms for Document processing : Document analysis and Deep learning methods
03.11 (4h) (a) Machine vision tutorial (Benoit, Sofia). Introduction to Jupyter. Deep learning in practice. (b) Project development
08.11 (2h) Algorithms for Knowledge modelling : Semantic web, ontologies, graph database, homologous points, disambiguation.
10.11 (4h) (a) Exercise in semantic modelling and inference (Maud) (b) Project development
15.11 (2h) Algorithms for Generative models and simulation : Rule-based inference, Deep learning based generation. Discussion on new regimes of visibility.
17.11 (4h) (a) Exercise in deep-learning based generation (Benoit, Sofia) (b) Project development
Part III : Platform management
22.11 (2h) Data Management : Computing infrastructure, Data Management models, Sustainability. Apps. Management of uncertainty, incoherence and errors. Iconographic principle of precaution. Example of Wikipedia and Europeana. IIIF and DHCanvas (Orlin). Open Annotation Data Model. Shared Canvas.
24.11 (4h) (a) Oral presentation of the state of the project and the data processed (b) Preparation of deployment for testing phase
29.11 (2h) User Management : Representation, Rights, Traceability, Vandalism, Motivation, Negotiation spaces. Right to be forgotten.
01.12 (4h) Testing phase
06.12 (2h) Bot Management : Versioning. Open source repositories.
08.12 (4h) Testing phase and report writing
13.12 (2h) Report writing
15.12 (4h) Final project presentation
References
Key Figures
Identity map (Cardon)
Maps for Big Data Digital Humanities (Kaplan)
Semiotic Triangle (McCloud)
Infinite Canvas (McCloud)
Uncanny Valley (Mori)
Databases
(Page to be created indicating characteristics, quantity and copyright)
Le Temps Archives
Cini Photoarchive
Venice Time Machine documents
Scans of Acedemic Book and journals about Venice
Linked Book
Notation system
Wiki writing (10%)
Project design (20%)
Project implementation (20%)
Project testing (20%)
Oral presentations (30%)