What if Twitter wasn't the fastest one…

[Italiano] We have recently done some comparative analysis between Twitter propagation dynamics and FriendFeed propagation dynamics. We chose, as our case study, the news related to the rescue operations of the San Josè mining accident (that left 33 men trapped 700 metres (2,300 ft) below ground for 69 days) [I’m not describing the research now, we have a paper under submission.. so stay tuned for more about the research itself].
As a side product of this research we had the opportunity to monitor the audience exposed to the miner’s rescue news both on Friendfeed and on Twitter. We were therefore able to observe how fast a specific news spreaded through both the networsk and, since we are observing the same news, we can assume that different propagation speeds can be related to the different propagation mechanisms taking place into the two systems.
Even considering the huge difference in absolute numbers (Twitter has a larger number of users) the line of FriendFeed based propagation (Esposti FF) is steeper and shows a less linear progression than Twitter’s line (Esposti TW).


This seems to suggest that a propagation based largely on the interactions made by the people you follow is faster than a propagation based mostly on explicit re-sharing practices (ReTweets).[English]

Come effetto collaterale di una ricerca che abbiamo da poco concluso sulla propagazione online delle news relative al salvataggio dei minatori chileni intrappolati nella miniera di San Jose abbiamo potuto verificare la velocità di propagazione di una notizia all’interno del network di Friendfeed e di Twitter. Dato che la notizia di partenza era la stessa possiamo ipotizzare che le differenze di propagazione osservate siano imputabili ai meccanismi di propagazione dei due sitstemi. Anche tenendo in considerazione l’enorme differenza in termini numerici la curva della propagazione di FriendFeed appare molto più ripida e meno lineare di quella di Twitter.

Questa differenza sembrerebbe indicare come un meccanismo di propagazine basato sulle interazioni dei propri contatti (come avviene su FriendFeed) piuttosto che su esplicite pratiche di propagazione risulti essere più efficace dal punto di vista della velocità.

Building your Twitter replynet with SQLite

Data preparation is one of the most important and interesting part of every research activity. Recently I’ve really enjoyed Axel’s posts about how to extract conversation networks out of Twapperkeeper archives. Axel uses awk which is a very powerful and flexible tool to manipulate csv files. Obviously there are several ways to manipulate a csv (or any kind of data) in order to extract a network file readable by a SNA software such as Gephi. We at SIGSNA have always used a database approach. This is probably due to the high expertise of Matteo with databases but I must admit that it seems to have quite a few advantages.
With this post – the first of a more methodological oriented series – I’ll show how to extract a conversation network (aka Reply Network) out of a Twapperkeeper archive using SQLite. Please note that this will be a non technical post and it i intended for a non technical audience.
SQLite is a cross platform embedded relational database management system.
(On osx you can start it simply typing sqlite3 from your command shell).
First we have to create a table suitable for importing a twapperkeeper archive:
.create table tablename (text varchar,
to_user_id int,
from_user varchar,
id int,
from_user_id int,
iso_language_code varchar,
source varchar,
profile_image_url varchar,
geo_type float,
geo_coordinates_0 float,
geo_coordinates_1 float,
created_at timestamp,
time long);

once we’ve created the table we just have to import the .csv file into our table
.separator ","
.import filename.csv tablename

Once we’ve imported our data we’ve got the full advantages of a powerful database and creating our conversation network it’s really simple.
Let’s create a csv file with a line for every @reply in our data with the user Id of the sender and the user id of the receiver.
First we have to set a file as the output of our query:
.output nameoutyournetwork.csv
then we can run the query:
select from_user_id,to_user_id from yourtable where text like '@%' and to_user_id <> ' ';
after that we can set the output back to normal (on screen) mode:
.output stdout
And that’s all: you’ll find a csv file of the conversation network in your working directory.
Of course once you’ve imported your data into a database you can perform much more complex queries e.g. filter them according to the declared language:
select from_user_id,to_user_id from yourtable where text like '@%' and iso_language_code ='it';
Obviously this approach will provide a Reply network with userID instead on UserNames and this can be good during the research phases but we could prefer to have readable usernames for our final visualisations.
We’re going into this problem and we’ll see how to solve it with some awk scripting in our next post.[articolo disponibile solo in inglese]

FriendFeed Under A Microscope

[Italiano]
A group of Students from the School of Information and the School of Public Health of the Univeristy of Michigan have done some very nice visualisation work using our latest dataset. We’ve been in touch with Guangming Lang for some time to explain them the data structure and few insights and then they did the whole job.
The visualisations focus on sic questions: Where do FriendFeed users come from? What are the sources for entries on FriendFeed? How engaged are FriendFeed users? How engaged are FriendFeed users as time passes? What is most talked about in FriendFeed? and How do the top 90 most followed users gain followers?
Well, enjoy this very nice work.[English]
Un gruppo di Studenti della School of Information e della School of Public Health dell’Università del Michigan hanno appena pubblicato alcune visualizzazioni basate sul nostro dataset di FriendFeed.. Il tutto è iniziato qualche tempo fa quando Guangming Lang ci ha contattato chiedendoci alcune informazioni sulla struttura dei nostri dati che aveva appena scaricato e, dopo qualche tempo, ecco il risultato del lavoro.
Le visualizzazioni rispondono ad alcune domande molto interessanti:
– Where do FriendFeed users come from?
– What are the sources for entries on FriendFeed?
– How engaged are FriendFeed users? How engaged are FriendFeed users as time passes?
– What is most talked about in FriendFeed?
– How do the top 90 most followed users gain followers?
Insomma, veramente un ottimo lavoro!

When followers are not enough

[italian version]
We have just gathered a brand new datased of FriendFeed data (you can download it in the Data section, it’s named 2010-a dataset ). Since it is considerably larger than our previous database we decided to test few more hypotesis on information propagation in SNSs. One of the key concepts speaking about the ability to spread online information is that being well connected is a key element in propagation strategies. This point can be roughly summarised as: the more followers you have the more you can inform. We’ve already challenged this assumption before and we wanted to test it deeper.
Therefore we analysed the relationship between the actual number of followers and the average audience of every users. We defined the average audience value as the average number of users been exposed to the messages sent by a specific user during our sample time. Due to the technical structure of Friendfeed users that were able to start the most engaging discussions have a larger opportunity of have an actual audience larger that the simple list of their followers.

Followers /Avg Users

As it is shown in the graph – that shows only the top 20 users according to their followers number – there could be a huge difference between the followers and the actual audience that users can engage. It is very interesting to point out how the users with a larger average audience is ranked only 18th according to the followers number.
As we said before, when we’re dealing with social phenomena and users engagement (as it happens in online propagations): followers are not enough.[English version]
Partendo dall’ultimo dataset che abbiamo acquisito con i dati di FriendFeed abbiamo iniziato a testare alcune ipotesi relative alla possibilità di definire la capacità comunicativa degli utenti all’interno di questo tipo di reti. Una delle assunzioni che si sono fatte più spesso (più in passato di quanto non avvenga ora) riguarda il nuero di followers. In pratica si considera spesso questo valore come un indicatore della capacità comunicativa di un utente. Brutalmente si pensa che se una persona è in contatto con molte altre persone questi abbia la capacità di raggiungere una massa importante di utenti.
Per verificare questo assunto abbiamo deciso di osservare la relazione tra il numero di follwers e la audience media degli utenti. Con audience media intendiamo il numero di utenti che sono stati esposti ai messaggi postati da uno specifico utente durante il nostro periodo campione (2 Mesi: Agosto- Settembre 2010).
Followers /Avg Users

Data la natura di FriendFeed l’audience tenderà a crescere verso valori più ampi rispetto ai follower diretti tanto più l’utente sarà in grado di far partire discussioni che riescono a propagarsi ed a coinvolgere gli amici degli amici e così via.
Come si può vedere dall’immagine [che mostra il rapporto tra followers e audience media per i 20 utenti con il maggior numero di followers all’interno della rete di FriendFeed italiana (solo account pubblici)] un elevato numero di followers non significa necesariamente un’elevata audience media, anzi l’utente che – in termini assoluti – raggiunge mediamente un’audience maggiore si colloca solo diciottesimo quando andiamo a contare i followers.
Insomma ancora una volta quando parliamo di reti sociali i numeri possono ingannare facilmente.

To whom are you speaking? Egonetwork over time

Last version of Gephi introduced some very nice feature. It is now possible to work with dynamic networks that can easily be observed in their evolution. Working with dynamic networks is crucial when you are dealing with social networks, like those existing in microblogging sites, that show a high level of variability: social connections quickly change over time and – even if the connection does not disappear – the use of a specific connection can be very different from time to time. Observing such a phenomenon could be difficult with a static SNA but with a dynamic perspective it becomes quite simple.

The movie shows how the egonetwork of my Friendfeed user changed within the period Aug. – Sept. 2010. The ego-node represents my user and the other nodes are all the users I’ve interacted with (on FriendFeed). Nodes with a higher level of interaction are visualised closer to the ego-node while users with a low level of interaction are pushed away from the ego-node.
The video span over two month of time with the data-resolution set at 10 days (this creates 5 different configuration of the network) and it clearly shows how closer nodes change even in such a short amount of time.
Even if this was intended to be just a demo of then new opportunities offered by gephi it provides some insights about how ego-networks evolve over time. This evolution can be due to endogenous or exogenous aspects but it seems to be quicker than what one could expect.L’ultima versione di Gephi permette finalmente la visualizzazione di reti dinamiche. Quella che trovate qui è una breve visualizzazione di come l’ego-network attiva di uno specifico utente (in questo caso il mio utente FriendFeed) è cambiata nel corso dei mesi di Agosto-Settembre 2010.

I nodi sono posizionati – rispetto al nodo centrale – in modo da rappresentare il livello di interazione: i nodi più prossimi sono quelli con un livello di interazione maggiore. Com’è possibile vedere i nodi più prossimi – ovvero i nodi con i quali in quel periodo ho interagito maggiormente – cambiano con un’elevata frequenza costringendo la rete a riadattarsi di conseguenza.
Oltre alla dimostrazione di alcune delle possibilità offerte dalla nuova versione di gephi questa breve visualizzazione ci permette di capire come i contatti con i quali interagiamo – pur all’interno di un numero sicuramente interiore rispetto all’insieme delle connessioni possibili – sono una realtà dinamica in costante evoluzione. Le ragioni di questa evoluzione possono essere le più diverse, da fattori endogeni alla rete (discussioni interessanti) a fattori esogeni (eventi esterni che desideriamo trattare con alcuni contatti).

(Italiano) A proposito di Twitter e della (mancata) conversazione

Recentemente Sysomos ha rilasciato alcuni dati che mostrano come solo il 29% dei Tweet produca effettivamente una reazione (6% Retweet e 23% Replay). A questi dati sono ovviamente seguiti una serie di commenti a proposito del lato social di Twitter o della sua natura Broadcasting.
L’idea alla base di questo ragionamento è che se su Twitter le persone non chiaccherano allora la dimensione social si perde a favore di un’infinita serie di messaggi individuali rivolti alla massa. Di fronte a questo ragionamento è però forse opportuna una riflessione sui dati proposti dalla icerca Sysomos.
Continua a leggere (Italiano) A proposito di Twitter e della (mancata) conversazione

Mapping FriendFeed Network: switching the perspective

Recently we’ve posted a visualisation of the Italian Network of FriendFeed. Such a map was an interesting and general perspective showing how a complex network can be visualised. Obviously when we’re dealing with social networks or microblogging sites we’re dealing with a complex network emerging from many different egonetworks. How is the perspective over the same Network (Italian Friendfeed Users) if we observe  it from the inside?
Starting from the same dataset we’ve used for the previous visualisation we generated an Egonetwork using as central user one of the minor nodes of the global visualisation. We chose the user lucamondini who, at that time, had a rather small network: 150 following and 200 followers. The idea was to see how the network looked like when the observer was one of the peripheral nodes.

An explanation of nodes sizes and colours can be found here, what’s interesting is that switching the perspective to this user’s point of view the overall scenario change. Even if it is always possible to find major and minor nodes they don’t seem to be necessarily the same nodes of the global map. Of course users that are very popular within the whole network seem to be quite popular also within local egonetwork but their specific size is different. Moving our observation to nodes that are even more peripheral (we chose the user magicabula, 21 followers and 33 following) – since it has so few followers this user wasn’t included into the global visualisation – will show a small network of highly connected users where some of the users are heavily connected even in the global map and some have a large authority only within this local perspective.
When we’re dealing with social network and information propagation we must keep in mind that the scenario might be really different when it is observed from the far periphery of the network.

Visualising Italian Friendfeed Network

Recently we’ve posted on Friendfeed a visualisation of Italian Friendfeed users extracted from the data we’ve collected in 2009. Since the map started an interesting debate (you can read it here – in Italian –) we thought to write a short post explaining how the map has been done and what are its limits and possibilities.

The map is based on a network made of 8024 nodes with 244542 edges (even if the map shows only the nodes with more than 147 followers but statistical values have been counted on the whole network).
We collected the data in September 2009 starting from all the public messages posted on Frienfeed (you can read more about this in out SBP10 paper).
We processed the network with Gephi and the map shows the indegree value as node size and betweenness centrality value as node colour.
The final result is rather interesting since it shows on one side a group of huge nodes with many followers but at the same time it shows how the is no simple correlation between the number of followers and the betweenness centrality value. Since BC value is often used to identify relevant nodes or hidden hubs this can be read as the quality of your connections matters more than their number.
Nevertheless a final remark has to be done. Metrics like betweenness centrality works really well in traditional networks but they fail to grasp the new conversational nature of Friendfeed Network (but the same could be said about Twitter). In Friendfeed conversations exist often out of the network made by the following/follower structure. When I get in touch with a message originally produced out of my network only because a friend mine comments on it, what happens is the creation of an actual network based more on social behaviour than on the underlying set of connections.
We need new social metrics.

Mapping the Italian FriendFeed Network

Recentemente abbiamo pubblicato su FriendFeed una visualizzazione della rete degli utenti basata sui nostri dati dell’anno scorso. Siccome dalla mappa (che potete vedere qui sotto) è nata un’interessante discussione abbiamo pensato che fosse opportuno fornire qualche dettaglio su come è stata realizzata e su quali possono essere le indicazioni ed i limiti di questo genere di visualizzazioni.

La mappa si basa su una rete di 8024 utenti – identificati come italiani – con 244542 archi direzionati. In realtà la mappa visualizza solo gli utenti con un numero di follower maggiore di 147 (ma le statistiche sono state calcolate sull’intera rete).
La rete è stata raccolta nel Settembre del 2009 a partire dalle conversazioni prodotte dagli account pubblici (questo spiega l’assenza dalla mappa di quegli utenti con account privati). Per maggiori dettagli sul processo di estrazione dei dati è possibile vedere questo paper.
La rete è stata elaborata con Gephi (http://gephi.org) e mostra i link in ingresso (In Degree) come dimensione dei nodi ed il valore di Betweenness Centrality come intensità del colore.
La mappa che emerge è decisamente interessante perché mostra da un lato, come prevedibile, una serie di utenti molto seguiti (nodi molto grandi) ed un numero decisamente maggiore di utenti minori variamente connessi. Al tempo stesso la mappa mostra però una assenza di correlazione tra la dimensione del nodo (il numero di followers) e l’intensità del colore (la Betweenness Centrality). La Betweenness Centrality è una misura spesso utilizzata per individuare hub o nodi particolarmente significativi all’interno di una rete, misura infatti quanto spesso uno specifico nodo si trova nel “percorso più breve” tra due nodi della rete stessa.
Da questo punto di vista è possibile quindi notare che il numero di connessioni non è, da solo, garanzia di essere hub centrali nella rete (anche se, ovviamente, aiuta).
In conclusione è però necessario fare una piccola considerazione.
Metriche come quelle qui utilizzate sono standard nella Social Network Analysis e funzionano bene per descrivere delle reti tradizionali. Reti come FriendFeed (ma lo stesso vale per Twitter) hanno però alcune specificità rispetto alle modalità di propagazione delle informazioni che rendono queste metriche solo parzialmente descrittive di questi network. Quando su FriendFeed il meccanismo dei commenti o dei like mi porta in contatto con informazioni postate da contatti che io non seguo direttamente (e questo accade in continuazione) ecco che si realizza una connessione che la nostra mappa – costruita a partire dai follower e following espliciti – non riesce a visualizzare e della quale un valore come la Betweennes Centrality non tiene conto.
In altre parole le metriche tradizionali funzionano molto bene quando devono descrivere la rete fino a che la rete è immobile, quando la rete si anima e le relazioni sociali entrano in gioco abbiamo bisogno di nuove metriche.
Su queste nuove metriche stiamo lavorando e speriamo di riuscire a presentare qualcosa presto.

Twitter @reply Networks on UK General Elections #UKGE2010

Few days ago Axle Burns and the people from “Mapping online publics” posted a very interesting article about mapping the Australian election following #ausvote tweets. The idea behind that was rather good and simple: by mapping all the messages containing the conventional reply symbol (@username) one could map the conversational network surrounding a specific topic (defined by the #hashtag). Of course this methods has some limitations (clearly explained by Axle), nevertheless it can be use to produce a rough map of the conversational network.
Since some time ago we’ve downloaded (using Twapperkeeper – the same service used by Axel) all the tweets with the hashtag #ukge2010 (the “official” hashtag about Uk general elections) we have decided to do the same analysis on Uk tweets.
So we’ve got the @replies network and using gephi we counted the indegree and the betweenness centrality of the nodes. Following Axle we’ve also excluded from the visualisation itself any users who received fewer than 100 @replies.
Here you can see the result:

The size of the nodes represents the indegree value while the colour represents the betweenness centrality. This is the table showing the top values:

What can be easily noticed is that most important nodes within the conversational network are not the official twitter account from political parties or politicians. Bloggers, journalists and consultants like @carlmaxim or @bengoldacre get more direct replies than official twitter account from politicians – like @nick_clegg – or from political parties – like @UKLabour-.
This can be due to the use that those users make of Twitter, @nick_clegg or the @UKLabour probably are not perceived as someone you would reply or address directly.
At the same time it is important to highlight how there is no clear correlation between indegree and betweenness centrality, users like @bloggerheads have a high betweenness centrality value but a very low indegree. This can be surely due to a different behaviour of users (outdegree value is important in how betweenness centrality is defined) but at the same time I think that betweenness centrality, even if is a standard measure for sna, is unable to get the real complexity of a conversation network connected through a Twitter #hashtag (but we’re coming back on this point very soon).