Projects: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
Line 29: Line 29:
Supervision: Lia
Supervision: Lia


= Newspaper mining =
= Newspaper, WIkipedia, Semantic Web mining =
The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities.  
The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities.  
These sentences should be posted as pulses.
These sentences should be posted as pulses.


The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.  
The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.  
In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)


Resp. Maud
Resp. Maud


Skills: Python or Java
Skills: Python or Java

Revision as of 13:12, 28 September 2017

Platform management and development: ClioWire platform

On the basis on existing codes, develop API for the other groups (posting pulses, searching pulses) Develop bots to rewrite pulses based on other sources.

Initial code base will be Mastodon.

Knowledge required : Python, Javascript, basic linux administration.

Decomposition in elementary units of the secondary sources

The goal is to extract from a collection of 3000 scanned books all the sentences containing at least two named entities and transforming them into pulses.

Decomposition in elementary units of primary sources

This group will look for named entities in digiitized manuscript and post pulses about these mentions. The group will use Wordspotting methods

Supervisor : Sofia

Skills : Java

Decomposition in elementary units of image banks

The goal is to transform the metadata of CINI which have been OCRed into pulses. One challenge is to deal with OCR errors and possible disambiguation.

Supervision: Lia

Newspaper, WIkipedia, Semantic Web mining

The goal is to find all the sentences in a large newspaper archive that contains at least 2 names entities. These sentences should be posted as pulses.


The named entity detection have already been done. The only challenge to retrieve the corresponding sentences in the digitized transcriptions.

In addition, this group should look for ways for importing massively element of knowledge from other sources (DBPedia, RDF databases)

Resp. Maud

Skills: Python or Java