ClioWire: Difference between revisions

From FDHwiki
Jump to navigation Jump to search
 
(62 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Presentation =  
= Presentation =  
ClioWire is the platform developed by the EPFL Digital Humanities master students and the members of the EPFL Digital Humanities Lab (DHLAB). The platform is an hybridation between Twitter and Wikipedia. The ambition of this platform is to curate simple open bricks for historical knowledge, shaped as short texts. It is meant to be based on no prefixed ontology, but collectively negotiated conventions and reformatting by bots.  This is what works well on Wikipedia and in our Digital Humanities bachelor course.
ClioWire is the platform developed by the EPFL Digital Humanities master students and the members of the EPFL Digital Humanities Lab (DHLAB). The platform is an hybridation between Twitter and Wikipedia. The ambition of this platform is to curate simple open bricks for historical knowledge, shaped as short texts. It is meant to be based on no prefixed ontology, but collectively negotiated conventions, reformatting by bots and visualization by dedicated apps.  This is what works well on Wikipedia and in our Digital Humanities bachelor course.
It is meant to offer  easy onboarding, i.e. the process in which one get accustom to a new system/platform (even for beginners and people with only very basic IT skills). This is what works well on Twitter. One goal is to offer simple and robust long-term archiving for bricks of historical knowledge.
It is meant to offer  easy onboarding, i.e. the process in which one get accustom to a new system/platform (even for beginners and people with only very basic IT skills). This is what works well on Twitter. One goal is to offer simple and robust long-term archiving for bricks of historical knowledge.


ClioWire curates Pulses as basic bricks for historical knowledge. They constitute some sort of minimal information unit coding relationship between entities and sources. They typically encode what is usually referred as events. Pulses are artificial construction that can be continuously being remolded and combined in order to create greater pieces of knowledge.  
ClioWire curates Pulses as basic bricks for historical knowledge. They constitute some sort of minimal information unit coding relationship between entities and sources. They typically encode what is usually referred as events. Pulses are artificial construction that can be continuously being remolded and combined in order to create greater pieces of knowledge.  
We suggest to try to model our historical knowledge as an immense collection of Pulses.  
We suggest to try to model our historical knowledge as an immense collection of Pulses.
 
ClioWire is constructed around the reintrepreation of two existing concepts : the concept of hashtag, the concept of bots, the concept of front ends. The hashtage is much more than a convienent way for tagging content, it is a fundamental mechanism for coordinating semantics in extremely large distributed systems. Bot are much more than just funny users posting strange things on Twitter or helpers for Wikipedia, they are a new way of conceiving software engineering for large-scale projects. Front-ends are not just front end on mastered back-end, they can also be operates as interactive tools on data they are not mastering. The combination of these three concepts create a novel technological ecosystem for knowledge management and production.
 
Cliowire is constructed around the ideas of smooth semantization in contrast with right and formal models which in practive have difficulties to be adopted in large distributed systems.


= Main characteristics =  
= Main characteristics =  
Line 59: Line 63:


Simple models of distributed coordination https://infoscience.epfl.ch/record/187558/files/cs-kaplan05.pdf
Simple models of distributed coordination https://infoscience.epfl.ch/record/187558/files/cs-kaplan05.pdf
=== Folksonomy and emergent classification systems  ===


== Rigid semantization vs smooth semantization ==
== Rigid semantization vs smooth semantization ==
Line 64: Line 70:
Contrastring the pro and cons of two approaches to the semantization process.
Contrastring the pro and cons of two approaches to the semantization process.


== The hastag is a pivotal symbol ==
== The hashtag is a pivotal symbol ==
 
===Origins and chronology ===
 
The US pound sign, number sign or hash symbol "#" is often used in information technology to highlight a special meaning
The hashtag may contain letters, digits, and underscores.
 
1970: The number sign was used to denote immediate address mode in the assembly language of the PDP-11.


The Hastag use on Twitter was introduced in 2007, inspired by the IRC practice of channels  
1978 : Brian Kernighan and Dennis Ritchie used # in the C programming language for special keywords that had to be processed first by the C preprocessor
 
https://en.wikipedia.org/wiki/The_C_Programming_Language
 
1986 SGML standard, ISO 8879:1986 (q.v.), # is a reserved name indicator (rni) which precedes keyword syntactic literals, --e.g., the primitive content token #PCDATA, used for parsed character data.
 
1988: The pound sign was adopted for use within IRC networks circa 1988 to label groups and topics
 
2007: Introduction of The Hastag use on Twitter, inspired by the IRC practice of channels  
 
2009: Hyperlink of all hashtags in tweets to Twitter
 
2010: Instagram is launched in 2010 and starts using hashtags
 
2010: Introduction of "Trending Topics" on the Twitter front page
 
2010:  Arab Spring and the European anti-austerity movement, which both used hashtags to brand what they were doing.
 
2011:  Occupy Wall Street hashtag. Hashtag for just-in-time information about protest activity.
 
=== Pragmatics of the hashtag in social media  ===
 
 
Hashtags have developed beyond their search function to convey context information. From #NIPS2017 to  #soproud
Recreating context despite asychrnony.
 
 
 
No one owns or administers a tag channel. A channel is created the first time someone posts a status with a channel tag. http://factoryjoe.com/blog/2007/08/25/groups-for-twitter-or-a-proposal-for-twitter-tag-channels/
 
 
Enough people started using them eventually that in 2009 Twitter decided to embrace them.
 
The hashtag "was created organically by Twitter users as a way to categorize messages."
 
"Twitter facilities one-to-many, asynchronous communication, and so tweeters are unlikely to be able to assume that they share contextual assumptions with all or any of their audience. By allowing tweeters to make their intended contextual assumptions accessible to a wide range of readers, hashtags facilitate the use of an informal, casual style, even in the unpredictable and largely anonymous discourse context of Twitter."
 
"The length restriction and the need to abbreviate and omit elements of the messages mean that the tweeter has to depend on her readers to be able to reconstruct the full intended message from the non-standard, abbreviated forms."
 
Tweeters are, therefore, writing for an “imagined audience” (Marwick and boyd, 2010; Brake, 2012; Litt, 2012).
 
as boyd (2010:46) notes, the discourse context does not persist, and “what sticks around may lose its essence when consumed outside of the context in which it was created”.
 
"The hashtag, I suggest, has been appropriated to act as a highlighting device. It allows the tweeter to make certain contextual assumptions highly accessible, and thus guides the hearer to the intended overall interpretation in the most efficient and economical manner."
 
From The pragmatics of hashtags: Inference and conversational style on Twitter / http://www.sciencedirect.com/science/article/pii/S037821661500096X
 
 
 
 
 
The hashtag was not created for Twitter. The hashtag was created for the internet.
 
Because of its widespread use, hashtag was added to the Oxford English Dictionary in June 2014 https://www.theregister.co.uk/2014/06/13/hashtag_added_to_the_oed/
 
Ref:


An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/
An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/
The pragmatics of hashtags: Inference and conversational style on Twitter / http://www.sciencedirect.com/science/article/pii/S037821661500096X
Kathleen Ferrara, Hans Brunner, Greg Whittemore Interactive written discourse as an emergent register / http://journals.sagepub.com/doi/10.1177/0741088391008001002
danah boyd Social network sites as networked public: affordances, dynamics and implications Z. Papacharissi (Ed.), Networked Self: Identity, Community and Culture on Social Network Sites, Routledge, Abingdon (2010), pp. 39-58
https://www.danah.org/papers/2010/SNSasNetworkedPublics.pdf
Chris Messina Groups for Twitter; or a proposal for Twitter tag channels (2007) Available from: http://factoryjoe.com/blog/2007/08/25/groups-for-twitter-or-a-proposal-for-twitter-tag-channels/
MacArthur, Amanda, n.d. The History of Hashtags: Shedding Some Light on the History of Hashtags and How We’ve Come to Use Them. Available from: http://twitter.about.com/od/Twitter-Hashtags/a/The-History-Of-Hashtags.htm
=== Hashtag, from lingua franca to universal language ===
* A way to highlight to increase the signal / noise ratio
* A way to desambiguiate giving universal ID
* A way to compress complex context using ready-made chunks
Hashtag is a way of indicating that a word is part of universal language
Hashtags as highlighting devices: In its original functionality, this enabled a reader to find related content amidst the noise (Page, 2014)
ready-made chunks or schemas describing often-encountered sequences of actions or events”. A form of linguistic compression of context.
No body own hashtags. Goods and bad.
The transition from normal language to universal language
In one-to-many “broadcast communication”, such as Twitter, the message is adressed to  “whoever finds it relevant” (Sperber and Wilson, 1986/95:158). As Sperber and Wilson (1986/95:158)
ref :
https://www.hashtags.org/
=== Hashtag and machine-readable ontologies ===
== Retweeting as versioning  ==
Versioning in Wikipedia


== Attribution ID for entities and predicates ==
== Attribution ID for entities and predicates ==
Line 75: Line 184:


= Bot oriented architecture =
= Bot oriented architecture =
== History of the concept of bot ==


  == Bots on Twitter ==
== History of the concept of bot ==
 
== Bots on Twitter ==
 
== Bots on Wikipedia ==
 
== Machine2Machine ==
 
Relevance constraint (Sperber and Wilson) in Machine2Machine communication. Could we see a similar optmization effect
 
== Interstitial Medium ==
 
= App/Shelves/Front-end  =
 
== Back-end independent front-ends ==
 
== Search engines ==


  == Bots on Wikipedia ==
== Visualization ==
 
== Catalogs  ==
 
== Network  ==
 
 
 
 
= Examples of bots  =


= Examples of negotiated conventions and bot usage =
= Examples of negotiated conventions and bot usage =


Below is the results of a small scaled negotiation between the first 8 cliowire developers.  
Below is the results of a small scaled negotiation between the first 8 cliowire developers.  


== Coding Pulses ==
== Coding Pulses ==

Latest revision as of 10:59, 15 December 2017

Presentation

ClioWire is the platform developed by the EPFL Digital Humanities master students and the members of the EPFL Digital Humanities Lab (DHLAB). The platform is an hybridation between Twitter and Wikipedia. The ambition of this platform is to curate simple open bricks for historical knowledge, shaped as short texts. It is meant to be based on no prefixed ontology, but collectively negotiated conventions, reformatting by bots and visualization by dedicated apps. This is what works well on Wikipedia and in our Digital Humanities bachelor course. It is meant to offer easy onboarding, i.e. the process in which one get accustom to a new system/platform (even for beginners and people with only very basic IT skills). This is what works well on Twitter. One goal is to offer simple and robust long-term archiving for bricks of historical knowledge.

ClioWire curates Pulses as basic bricks for historical knowledge. They constitute some sort of minimal information unit coding relationship between entities and sources. They typically encode what is usually referred as events. Pulses are artificial construction that can be continuously being remolded and combined in order to create greater pieces of knowledge. We suggest to try to model our historical knowledge as an immense collection of Pulses.

ClioWire is constructed around the reintrepreation of two existing concepts : the concept of hashtag, the concept of bots, the concept of front ends. The hashtage is much more than a convienent way for tagging content, it is a fundamental mechanism for coordinating semantics in extremely large distributed systems. Bot are much more than just funny users posting strange things on Twitter or helpers for Wikipedia, they are a new way of conceiving software engineering for large-scale projects. Front-ends are not just front end on mastered back-end, they can also be operates as interactive tools on data they are not mastering. The combination of these three concepts create a novel technological ecosystem for knowledge management and production.

Cliowire is constructed around the ideas of smooth semantization in contrast with right and formal models which in practive have difficulties to be adopted in large distributed systems.

Main characteristics

Their main characteristics are the following

  • Pulses are 140-character long textual sequence (like tweets) (this constraint may be dropped)
  • Syntax of Pulses follows some (evolving) conventions but no strict rules (like tweets)
  • Pulses can be written by humans or by machine, and read by humans or machines (like tweets).
  • Pulses have a first author but can be modified and rewritten by anyone. They are fundamentally considered as a common good (unlike tweets)
  • Sequences of pulses are fully versioned. The history of their modification can be visualized (unlike tweets)
  • Pulses can be attached to geographical coordinates, temporal coordinates, sources (URLs), document segments
  • Each Pulses as a unique universal ID (like tweet), that can serve as reference in other Pulses and on the web.
  • Pulses can be tagged as dubious or any other classification characterizing the trust one can have. The tagging itself can be partly done by bot (like for tweets)
  • In sum, tweets are particular kind of Pulses, but Pulses are more general than tweets

The managing system for Pulses enables to do the following operations

  • Inserting massive numbers of Pulses based on existing sources. If a bot imports Pulses massively, it will be their first author.
  • More generally applying bot operation at a massive scale on Pulses (like for Wikipedia)
  • Retrieving all the Pulses mentioning an entity (people, place) and visualizing them as timelines or maps or both.
  • Retrieving all the intervention of a given user (i.e creation or editing of Pulses)
  • Retrieving all the Pulses mentioning a given source (URL)
  • Visualizing the editing history a sequence of Pulses
  • Mapping the worldwide activity of Pulses
  • Computing the list of the most active user, the most edited Pulses (like in Wikipedia)
  • More things in the same spirit …

Self-organization of semantic conventions

State of the art in self-organizing conventions and emergent semantics

This section is under construction. Below some (quite old) references

Emergent Semantics for the Semantic Web

https://link.springer.com/chapter/10.1007/978-3-540-24571-1_2

Aberer :

https://s3.amazonaws.com/academia.edu.documents/30724738/WDAS2002.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1513327302&Signature=%2F%2FjnWs2HOqYPTQqp7kAICPiU64U%3D&response-content-disposition=inline%3B%20filename%3DAn_overview_on_peer-to-peer_information.pdf

Ouksel, M., Ahmed, I.: Ontologies are not the panacea in data integration: A flexible coordinator for context construction. Journal of Distributed and Parallel Databases 7(1) (1999)

Ouksel, M.: A Framework for a Scalable Agent Architecture of Cooperating Heterogeneous Knowledge Sources. Springer, Heidelberg (1999)

McCool, R., Guha, R.V.: Tap, building the semantic web

Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Computer 35(3), 66–70 (2002)

Hauswirth, M., Datta, A., Aberer, K.: Efficient, self-contained handling of identity in peer-to-peer systems

Ren, Wei; Beard, R. W.; Atkins, E. M. (8–10 June 2005). "A survey of consensus problems in multi-agent coordination". 2005 American Control Conference: 1859–1864. doi:10.1109/ACC.2005.1470239.

Mathematical and Multi-agent models for Self-organizing vocabularires

Simple models of distributed coordination https://infoscience.epfl.ch/record/187558/files/cs-kaplan05.pdf

Folksonomy and emergent classification systems

Rigid semantization vs smooth semantization

Contrastring the pro and cons of two approaches to the semantization process.

The hashtag is a pivotal symbol

Origins and chronology

The US pound sign, number sign or hash symbol "#" is often used in information technology to highlight a special meaning The hashtag may contain letters, digits, and underscores.

1970: The number sign was used to denote immediate address mode in the assembly language of the PDP-11.

1978 : Brian Kernighan and Dennis Ritchie used # in the C programming language for special keywords that had to be processed first by the C preprocessor

https://en.wikipedia.org/wiki/The_C_Programming_Language

1986 SGML standard, ISO 8879:1986 (q.v.), # is a reserved name indicator (rni) which precedes keyword syntactic literals, --e.g., the primitive content token #PCDATA, used for parsed character data.

1988: The pound sign was adopted for use within IRC networks circa 1988 to label groups and topics

2007: Introduction of The Hastag use on Twitter, inspired by the IRC practice of channels

2009: Hyperlink of all hashtags in tweets to Twitter

2010: Instagram is launched in 2010 and starts using hashtags

2010: Introduction of "Trending Topics" on the Twitter front page

2010: Arab Spring and the European anti-austerity movement, which both used hashtags to brand what they were doing.

2011: Occupy Wall Street hashtag. Hashtag for just-in-time information about protest activity.

Pragmatics of the hashtag in social media

Hashtags have developed beyond their search function to convey context information. From #NIPS2017 to #soproud Recreating context despite asychrnony.


No one owns or administers a tag channel. A channel is created the first time someone posts a status with a channel tag. http://factoryjoe.com/blog/2007/08/25/groups-for-twitter-or-a-proposal-for-twitter-tag-channels/


Enough people started using them eventually that in 2009 Twitter decided to embrace them.

The hashtag "was created organically by Twitter users as a way to categorize messages."

"Twitter facilities one-to-many, asynchronous communication, and so tweeters are unlikely to be able to assume that they share contextual assumptions with all or any of their audience. By allowing tweeters to make their intended contextual assumptions accessible to a wide range of readers, hashtags facilitate the use of an informal, casual style, even in the unpredictable and largely anonymous discourse context of Twitter."

"The length restriction and the need to abbreviate and omit elements of the messages mean that the tweeter has to depend on her readers to be able to reconstruct the full intended message from the non-standard, abbreviated forms."

Tweeters are, therefore, writing for an “imagined audience” (Marwick and boyd, 2010; Brake, 2012; Litt, 2012).

as boyd (2010:46) notes, the discourse context does not persist, and “what sticks around may lose its essence when consumed outside of the context in which it was created”.

"The hashtag, I suggest, has been appropriated to act as a highlighting device. It allows the tweeter to make certain contextual assumptions highly accessible, and thus guides the hearer to the intended overall interpretation in the most efficient and economical manner."

From The pragmatics of hashtags: Inference and conversational style on Twitter / http://www.sciencedirect.com/science/article/pii/S037821661500096X



The hashtag was not created for Twitter. The hashtag was created for the internet.

Because of its widespread use, hashtag was added to the Oxford English Dictionary in June 2014 https://www.theregister.co.uk/2014/06/13/hashtag_added_to_the_oed/

Ref:

An Oral History of the Hashtag: https://www.wired.com/2017/05/oral-history-hashtag/

The pragmatics of hashtags: Inference and conversational style on Twitter / http://www.sciencedirect.com/science/article/pii/S037821661500096X

Kathleen Ferrara, Hans Brunner, Greg Whittemore Interactive written discourse as an emergent register / http://journals.sagepub.com/doi/10.1177/0741088391008001002

danah boyd Social network sites as networked public: affordances, dynamics and implications Z. Papacharissi (Ed.), Networked Self: Identity, Community and Culture on Social Network Sites, Routledge, Abingdon (2010), pp. 39-58 https://www.danah.org/papers/2010/SNSasNetworkedPublics.pdf

Chris Messina Groups for Twitter; or a proposal for Twitter tag channels (2007) Available from: http://factoryjoe.com/blog/2007/08/25/groups-for-twitter-or-a-proposal-for-twitter-tag-channels/


MacArthur, Amanda, n.d. The History of Hashtags: Shedding Some Light on the History of Hashtags and How We’ve Come to Use Them. Available from: http://twitter.about.com/od/Twitter-Hashtags/a/The-History-Of-Hashtags.htm

Hashtag, from lingua franca to universal language

  • A way to highlight to increase the signal / noise ratio
  • A way to desambiguiate giving universal ID
  • A way to compress complex context using ready-made chunks

Hashtag is a way of indicating that a word is part of universal language

Hashtags as highlighting devices: In its original functionality, this enabled a reader to find related content amidst the noise (Page, 2014)

ready-made chunks or schemas describing often-encountered sequences of actions or events”. A form of linguistic compression of context.


No body own hashtags. Goods and bad.

The transition from normal language to universal language

In one-to-many “broadcast communication”, such as Twitter, the message is adressed to “whoever finds it relevant” (Sperber and Wilson, 1986/95:158). As Sperber and Wilson (1986/95:158)

ref :

https://www.hashtags.org/

Hashtag and machine-readable ontologies

Retweeting as versioning

Versioning in Wikipedia

Attribution ID for entities and predicates

Negociation syntax

Bot oriented architecture

History of the concept of bot

Bots on Twitter

Bots on Wikipedia

Machine2Machine

Relevance constraint (Sperber and Wilson) in Machine2Machine communication. Could we see a similar optmization effect

Interstitial Medium

App/Shelves/Front-end

Back-end independent front-ends

Search engines

Visualization

Catalogs

Network

Examples of bots

Examples of negotiated conventions and bot usage

Below is the results of a small scaled negotiation between the first 8 cliowire developers.


Coding Pulses

Someone may simply write in natural language, without any source or conventional naming

  • Francesco Raspi lived on the Rialto Bridge.

However, this Pulse may being tagged as unreliable. It is better to automatically produce a Pulse.

  • Francesco Raspi is the owner of parcels 514 in the Napoleonic Cadaster of 1808.

In order to increase possible articulatation between Pulses, hashtags can be used. Like for Twitter, there are no predefined hashtags but communities can negociate common ways to articulate the content of pulses. A community may decide to use very generic hashtag in order to increase the potential for generic processing. The list below reflects the current practice and on-going negociation in the community.

Entities

Each entities (Place, people, book, segment, volume, etc) is associated with a unique hastag.

Example: #MarinoZorzi is an entiy

Predicate #Eq

Predicate #Eq defines the commutative equivalence between two entities. This means that one can be replaced by another. It can be used to disambiguate mentions.

Syntax: (Entity or URL) #Eq (Entity or URL)

  • Ex: #MarinoZorzi #Eq #MarinoZorzi34
  • Ex: #MarinoZorzi #Eq http://..

Predicate #In

Predicate #In defines that one entity is a subpart / is included in another

Syntax: #In Entity Entity

  • Ex: #Venice #In #Italy
  • Ex: #PietyandPatronage_p3 #In #PietyandPatronage
  • Ex: #YoungFilmmakers #In #Filmmaker
  • Ex: #T_FrenchRevolution #In #T_XVIIIcentury

The syntax entity #In entity is for the moment prefered to the syntax #In entity entity, but this is subject to negotiation.

Predicate #Mention

The Hashtag #Mention is used to indicate the presence of an entity in a document. In transcription processes the resulting transcription is created as an entity . This entity will be disambiguate later on by the #Eq hashtag.

Syntax: #Mention #Entity #In #Volume #Source URL (opt #Prob_score)

* Ex: #Mention entity entiy entiy #In Volume

Predicate #Tag

Syntax:

  • Ex: #Tag #JeanLouisNicollier #realisateur #T_1998 URL

Predicate #Copresence

Syntax: #Copresence #entity #entity #In #volume/#entity

  • Ex: #Copresence #FrancescoRaspi #BaptistaNanni #In #PietyandPatronage_p3

Copresence could be replace by the general syntax of #Mention with several entities.

Predicate #Creator

Syntax: #Creator #entity #entity

  • Ex: #Creator #PietyandPatronage #WannaGoffen

Temporal Entity #T_

  • Ex: #T_1998
  • Ex: #T_1998_2000
  • Ex: #T_FirstCentury
  • Ex: #T_FrenchRevolution

Condition #When

  • Ex: #AdamSmith #In #Economists #When #T_1753


Condition #Where

  • Ex: #AdamSmith #In #Economists #Where #London

Condition #Source

  • Ex: #AdamSmith #In #Economists #Source #p14_LeTemps

Coding knowledge

Pulses can also be used to code knowledge ungrounded in Space and Time like:

  • Francesco Raspi is the son of Domenico Raspi

Or better:

  • SonOf #FrancescoRaspi #DomenicoRaspi in http://…

Attributing ID

Pulses can be used to document segment, artwork images and other entities that do not typically receive conventional name can receive a unique Id (possibly generated automatically). This tends to be done using the #eq predicate.

Coding inferred facts

Pulses can be used to code inferred fact like

  • Francesco Raspi is the grandson of Giovanni Raspi (inferred base on Pulse1, Pulse2 and Pulse3 and Rule3)

Encoding Rules

Inferring rules may also be coded with Pulses (using convention of variables)

  • R: If ($A is the son of $B) and ($B is the son of $C$) then ($A is the grandson of $C)