ClioWire: Difference between revisions
Line 64: | Line 64: | ||
Contrastring the pro and cons of two approaches to the semantization process. | Contrastring the pro and cons of two approaches to the semantization process. | ||
== The | == The hastag is a pivotal symbol == | ||
The Hastag use on Twitter was introduced in 2007, inspired by the IRC practice of channels | |||
An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/ | An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/ |
Revision as of 08:16, 15 December 2017
Presentation
ClioWire is the platform developed by the EPFL Digital Humanities master students and the members of the EPFL Digital Humanities Lab (DHLAB). The platform is an hybridation between Twitter and Wikipedia. The ambition of this platform is to curate simple open bricks for historical knowledge, shaped as short texts. It is meant to be based on no prefixed ontology, but collectively negotiated conventions and reformatting by bots. This is what works well on Wikipedia and in our Digital Humanities bachelor course. It is meant to offer easy onboarding, i.e. the process in which one get accustom to a new system/platform (even for beginners and people with only very basic IT skills). This is what works well on Twitter. One goal is to offer simple and robust long-term archiving for bricks of historical knowledge.
ClioWire curates Pulses as basic bricks for historical knowledge. They constitute some sort of minimal information unit coding relationship between entities and sources. They typically encode what is usually referred as events. Pulses are artificial construction that can be continuously being remolded and combined in order to create greater pieces of knowledge. We suggest to try to model our historical knowledge as an immense collection of Pulses.
Main characteristics
Their main characteristics are the following
- Pulses are 140-character long textual sequence (like tweets) (this constraint may be dropped)
- Syntax of Pulses follows some (evolving) conventions but no strict rules (like tweets)
- Pulses can be written by humans or by machine, and read by humans or machines (like tweets).
- Pulses have a first author but can be modified and rewritten by anyone. They are fundamentally considered as a common good (unlike tweets)
- Sequences of pulses are fully versioned. The history of their modification can be visualized (unlike tweets)
- Pulses can be attached to geographical coordinates, temporal coordinates, sources (URLs), document segments
- Each Pulses as a unique universal ID (like tweet), that can serve as reference in other Pulses and on the web.
- Pulses can be tagged as dubious or any other classification characterizing the trust one can have. The tagging itself can be partly done by bot (like for tweets)
- In sum, tweets are particular kind of Pulses, but Pulses are more general than tweets
The managing system for Pulses enables to do the following operations
- Inserting massive numbers of Pulses based on existing sources. If a bot imports Pulses massively, it will be their first author.
- More generally applying bot operation at a massive scale on Pulses (like for Wikipedia)
- Retrieving all the Pulses mentioning an entity (people, place) and visualizing them as timelines or maps or both.
- Retrieving all the intervention of a given user (i.e creation or editing of Pulses)
- Retrieving all the Pulses mentioning a given source (URL)
- Visualizing the editing history a sequence of Pulses
- Mapping the worldwide activity of Pulses
- Computing the list of the most active user, the most edited Pulses (like in Wikipedia)
- More things in the same spirit …
Self-organization of semantic conventions
State of the art in self-organizing conventions and emergent semantics
This section is under construction. Below some (quite old) references
Emergent Semantics for the Semantic Web
https://link.springer.com/chapter/10.1007/978-3-540-24571-1_2
Aberer :
Ouksel, M., Ahmed, I.: Ontologies are not the panacea in data integration: A flexible coordinator for context construction. Journal of Distributed and Parallel Databases 7(1) (1999)
Ouksel, M.: A Framework for a Scalable Agent Architecture of Cooperating Heterogeneous Knowledge Sources. Springer, Heidelberg (1999)
McCool, R., Guha, R.V.: Tap, building the semantic web
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Computer 35(3), 66–70 (2002)
Hauswirth, M., Datta, A., Aberer, K.: Efficient, self-contained handling of identity in peer-to-peer systems
Ren, Wei; Beard, R. W.; Atkins, E. M. (8–10 June 2005). "A survey of consensus problems in multi-agent coordination". 2005 American Control Conference: 1859–1864. doi:10.1109/ACC.2005.1470239.
Mathematical and Multi-agent models for Self-organizing vocabularires
Simple models of distributed coordination https://infoscience.epfl.ch/record/187558/files/cs-kaplan05.pdf
Rigid semantization vs smooth semantization
Contrastring the pro and cons of two approaches to the semantization process.
The hastag is a pivotal symbol
The Hastag use on Twitter was introduced in 2007, inspired by the IRC practice of channels
An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/
Attribution ID for entities and predicates
Negociation syntax
Examples of negotiated conventions
Below is the results of a small scaled negotiation between the first 8 cliowire developers.
Coding Pulses
Someone may simply write in natural language, without any source or conventional naming
- Francesco Raspi lived on the Rialto Bridge.
However, this Pulse may being tagged as unreliable. It is better to automatically produce a Pulse.
- Francesco Raspi is the owner of parcels 514 in the Napoleonic Cadaster of 1808.
In order to increase possible articulatation between Pulses, hashtags can be used. Like for Twitter, there are no predefined hashtags but communities can negociate common ways to articulate the content of pulses. A community may decide to use very generic hashtag in order to increase the potential for generic processing. The list below reflects the current practice and on-going negociation in the community.
Entities
Each entities (Place, people, book, segment, volume, etc) is associated with a unique hastag.
Example: #MarinoZorzi is an entiy
Predicate #Eq
Predicate #Eq defines the commutative equivalence between two entities. This means that one can be replaced by another. It can be used to disambiguate mentions.
Syntax: (Entity or URL) #Eq (Entity or URL)
- Ex: #MarinoZorzi #Eq #MarinoZorzi34
- Ex: #MarinoZorzi #Eq http://..
Predicate #In
Predicate #In defines that one entity is a subpart / is included in another
Syntax: #In Entity Entity
- Ex: #Venice #In #Italy
- Ex: #PietyandPatronage_p3 #In #PietyandPatronage
- Ex: #YoungFilmmakers #In #Filmmaker
- Ex: #T_FrenchRevolution #In #T_XVIIIcentury
The syntax entity #In entity is for the moment prefered to the syntax #In entity entity, but this is subject to negotiation.
Predicate #Mention
The Hashtag #Mention is used to indicate the presence of an entity in a document. In transcription processes the resulting transcription is created as an entity . This entity will be disambiguate later on by the #Eq hashtag.
Syntax: #Mention #Entity #In #Volume #Source URL (opt #Prob_score)
- Ex: #Mention #Francesco_554eacdd_6734_4c91_9b9a_5e40869403e7_000490_942_5513_363_139 #In #554eacdd_6734_4c91_9b9a_5e40869403e7 #Source https://images.center/iiif/554eacdd-6734-4c91-9b9a-5e40869403e7-000490/942,5513,363,139/full/0/default.jpg
* Ex: #Mention entity entiy entiy #In Volume
Predicate #Tag
Syntax:
- Ex: #Tag #JeanLouisNicollier #realisateur #T_1998 URL
Predicate #Copresence
Syntax: #Copresence #entity #entity #In #volume/#entity
- Ex: #Copresence #FrancescoRaspi #BaptistaNanni #In #PietyandPatronage_p3
Copresence could be replace by the general syntax of #Mention with several entities.
Predicate #Creator
Syntax: #Creator #entity #entity
- Ex: #Creator #PietyandPatronage #WannaGoffen
Temporal Entity #T_
- Ex: #T_1998
- Ex: #T_1998_2000
- Ex: #T_FirstCentury
- Ex: #T_FrenchRevolution
Condition #When
- Ex: #AdamSmith #In #Economists #When #T_1753
Condition #Where
- Ex: #AdamSmith #In #Economists #Where #London
Condition #Source
- Ex: #AdamSmith #In #Economists #Source #p14_LeTemps
Coding knowledge
Pulses can also be used to code knowledge ungrounded in Space and Time like:
- Francesco Raspi is the son of Domenico Raspi
Or better:
- SonOf #FrancescoRaspi #DomenicoRaspi in http://…
Attributing ID
Pulses can be used to document segment, artwork images and other entities that do not typically receive conventional name can receive a unique Id (possibly generated automatically). This tends to be done using the #eq predicate.
- #ImprintingCini434 #eq http://…
Coding inferred facts
Pulses can be used to code inferred fact like
- Francesco Raspi is the grandson of Giovanni Raspi (inferred base on Pulse1, Pulse2 and Pulse3 and Rule3)
Encoding Rules
Inferring rules may also be coded with Pulses (using convention of variables)
- R: If ($A is the son of $B) and ($B is the son of $C$) then ($A is the grandson of $C)