Projects: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
= Actual projects =
TBD
= Past projects =
All the projects are pieces of a larger puzzle.
All the projects are pieces of a larger puzzle.
The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.  
The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.  
Line 4: Line 11:
The platform is called [[ClioWire]]
The platform is called [[ClioWire]]


 
== ClioWire: Platform management and development ==
 
= ClioWire: Platform management and development =


This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses.  
This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses.  
Line 25: Line 30:


[[Platform management and development : Quantitative analysis of performance]]
[[Platform management and development : Quantitative analysis of performance]]


GitHub page of the project : [https://github.com/epflProjects/cliowire-bots]
GitHub page of the project : [https://github.com/epflProjects/cliowire-bots]


= Secondary sources =
== Secondary sources ==


The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses.
The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses.
Line 43: Line 47:
GitHub page of the project : [https://github.com/inverniz/FDH_SecondarySources]
GitHub page of the project : [https://github.com/inverniz/FDH_SecondarySources]


= [[Primary sources]] =
== [[Primary sources]] ==
 


This group will look for named entities in digiitized manuscript and post pulses about these mentions.  
This group will look for named entities in digiitized manuscript and post pulses about these mentions.  
Line 61: Line 64:
[[Primary sources]]
[[Primary sources]]


= Image banks =
== Image banks ==


The goal is to transform the metadata of CINI which have been OCRed into pulses.
The goal is to transform the metadata of CINI which have been OCRed into pulses.
Line 68: Line 71:
Supervision: Lia
Supervision: Lia


= Newspaper, Wikipedia, Semantic Web =
== Newspaper, Wikipedia, Semantic Web ==
''The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities.  
''The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities.  
These sentences should be posted as pulses.
These sentences should be posted as pulses.

Revision as of 06:36, 18 September 2018

Actual projects

TBD

Past projects

All the projects are pieces of a larger puzzle. The goal is to experiment a new approach to knowledge production and negociation based on a platform intermediary between Wikipedia and Twitter.

The platform is called ClioWire

ClioWire: Platform management and development

This group will manage the experimental platform of the course. They will have to run platform and develop additional features for processing and presenting the pulses. The initial code base is Mastodon.

The group will write bots for rewritting pulses and progressively converging towards articulation/datafication of the pulses.

Knowledge required : Python, Javascript, basic linux administration.

Resp. Vincent and Orlin

- Albane - Cédric
Platform management and development : State of art and Bibliography

Platform management and development : methodology

Platform management and development : Quantitative analysis of performance

GitHub page of the project : [1]

Secondary sources

The goal is to extract from a collection of 3000 scanned books about Venice all the sentences containing at least two named entities and transforming them into pulses. This should consiste a de facto set of relevant information taking a large base of Venetian documents.

Resp. Giovanni / Matteo

- Hakim - Marion

Named Entity Recognition

GitHub page of the project : [2]

Primary sources

This group will look for named entities in digiitized manuscript and post pulses about these mentions.

  • The group will use Wordspotting methods based on commercial algorithm. During the project, the group will have to set up a dedicated pipeline for indexing and searching the document digitized in the Venice Time Machine project and other primary sources using the software component provided.
  • The group will have to search for list of names or regular expressions. A method based on predefined list will be compared with a recursive method based on the results provided by the Wordspotting components.
  • Two types of Pulses will be produced : (a) "Mention of Francesco Raspi in document X" (b) "Franseco Raspi and Battista Nanni linked (document Y)"
  • The creation of simple web Front end to test the Wordspotting algorithm would help assessing the quality of the method

Supervisor : Sofia

Skills : Java, simple Linux administration

- Raphael - Mathieu

Primary sources

Image banks

The goal is to transform the metadata of CINI which have been OCRed into pulses. One challenge is to deal with OCR errors and possible disambiguation.

Supervision: Lia

Newspaper, Wikipedia, Semantic Web

The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities. These sentences should be posted as pulses.

The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.

In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)

Resp. Maud

Skills: Python or Java

- Laurene and Santiago


Newspaper, Wikipedia, Semantic Web : State of art and Bibliography