Revision as of 18:26, 5 October 2017

Presentation

ClioWire is the platform developed by the EPFL Digital Humanities master students and the members of the EPFL Digital Humanities Lab (DHLAB). The platform is an hybridation between Twitter and Wikipedia. The ambition of this platform is to curate simple open bricks for historical knowledge, shaped as short texts. It is meant to be based on no prefixed ontology, but collectively negotiated conventions and reformatting by bots. This is what works well on Wikipedia and in our Digital Humanities bachelor course. It is meant to offer easy onboarding, i.e. the process in which one get accustom to a new system/platform (even for beginners and people with only very basic IT skills). This is what works well on Twitter. One goal is to offer simple and robust long-term archiving for bricks of historical knowledge.

ClioWire curates Pulses as basic bricks for historical knowledge. They constitute some sort of minimal information unit coding relationship between entities and sources. They typically encode what is usually referred as events. Pulses are artificial construction that can be continuously being remolded and combined in order to create greater pieces of knowledge. We suggest to try to model our historical knowledge as an immense collection of Pulses.

Main characteristics

Their main characteristics are the following

Pulses are 140-character long textual sequence (like tweets) (this constraint may be dropped)
Syntax of Pulses follows some (evolving) conventions but no strict rules (like tweets)
Pulses can be written by humans or by machine, and read by humans or machines (like tweets).
Pulses have a first author but can be modified and rewritten by anyone. They are fundamentally considered as a common good (unlike tweets)
Sequences of pulses are fully versioned. The history of their modification can be visualized (unlike tweets)
Pulses can be attached to geographical coordinates, temporal coordinates, sources (URLs), document segments
Each Pulses as a unique universal ID (like tweet), that can serve as reference in other Pulses and on the web.
Pulses can be tagged as dubious or any other classification characterizing the trust one can have. The tagging itself can be partly done by bot (like for tweets)
In sum, tweets are particular kind of Pulses, but Pulses are more general than tweets

The managing system for Pulses enables to do the following operations

Inserting massive numbers of Pulses based on existing sources. If a bot imports Pulses massively, it will be their first author.
More generally applying bot operation at a massive scale on Pulses (like for Wikipedia)
Retrieving all the Pulses mentioning an entity (people, place) and visualizing them as timelines or maps or both.
Retrieving all the intervention of a given user (i.e creation or editing of Pulses)
Retrieving all the Pulses mentioning a given source (URL)
Visualizing the editing history of a Pulses
Mapping the worldwide activity of Pulses
Computing the list of the most active user, the most edited Pulses (like in Wikipedia)
More things in the same spirit …

Examples

Coding Event

It is possible to use standardized vocabulary but this is not compulsory. For instance, someone may simply write in natural language, without any source or conventional naming

Francesco Raspi lived on the Rialto Bridge.

However, this Pulse may being tagged as unreliable. It is better to automatically produce a Pulse like the following one

Francesco Raspi is the owner of parcels 514 in the Napoleonic Cadaster of 1808.

Or better:

#FrancescoRaspi owned #Nap1808Parcel514 in #1808

Even better:

PropertyMention #FrancescoRaspi #Nap1808Parcel514 in http://…

This Pulse will typically be attached to document segment where the information is (Sommarioni) and to the corresponding geometry (or point) in the geohistorical atlas.

Coding knowledge

Pulses can also be used to code knowledge ungrounded in Space and Time like:

Francesco Raspi is the son of Domenico Raspi

Or better:

SonOf #FrancescoRaspi #DomenicoRaspi in http://…

Attributing ID

Pulses can be used to document segment, artwork images and other entities that do not typically receive conventional name can receive a unique Id (possibly generated automatically)

IDassignment #ImprintingCini434 http://…

Coding inferred facts

Pulses can be used to code inferred fact like

Francesco Raspi is the grandson of Giovanni Raspi (inferred base on FictilisID1, Fictilis2 and Rule3)

Encoding Rules

Inferring rules may also be coded with Fictilis (using convention of variables)

R: If ($A is the son of $B) and ($B is the son of $C$) then ($A is the grandson of $C)

@@ Line 12: / Line 12: @@
 * Pulses are 140-character long textual sequence (like tweets) (this constraint may be dropped)
 * Syntax of Pulses follows some (evolving) conventions but no strict rules (like tweets)
-* Pulses can be written by humans or by machine, and read by humans are machines (like tweets).
+* Pulses can be written by humans or by machine, and read by humans or machines (like tweets).
-* Pulses have a first author but can be modified by anyone. They are fundamentally considered as a common good (unlike tweets)
+* Pulses have a first author but can be modified and rewritten by anyone. They are fundamentally considered as a common good (unlike tweets)
-* Pulses are fully versioned. The history of their modification can be visualized. They are like mini-Wikipedia page (unlike tweets)
+* Sequences of pulses are fully versioned. The history of their modification can be visualized (unlike tweets)
 * Pulses can be attached to geographical coordinates, temporal coordinates, sources (URLs), document segments
-* Each Pulses as a unique universal ID (like tweet), that can serve as reference in other Fictilis and on the web.
+* Each Pulses as a unique universal ID (like tweet), that can serve as reference in other Pulses and on the web.
 * Pulses can be tagged as dubious or any other classification characterizing the trust one can have. The tagging itself can be partly done by bot (like for tweets)
 * In sum, tweets are particular kind of Pulses, but Pulses are more general than tweets

ClioWire: Difference between revisions

Revision as of 18:26, 5 October 2017

Contents

Presentation

Main characteristics

Examples

Coding Event

Coding knowledge

Attributing ID

Coding inferred facts

Encoding Rules

Navigation menu

ClioWire: Difference between revisions

Revision as of 18:26, 5 October 2017

Presentation

Main characteristics

Examples

Coding Event

Coding knowledge

Attributing ID

Coding inferred facts

Encoding Rules

Navigation menu

Search