settembre 2010

Mapping FriendFeed Network: switching the perspective

Recently we’ve posted a visualisation of the Italian Network of FriendFeed. Such a map was an interesting and general perspective showing how a complex network can be visualised. Obviously when we’re dealing with social networks or microblogging sites we’re dealing with a complex network emerging from many different egonetworks. How is the perspective over the same Network (Italian Friendfeed Users) if we observe it from the inside?
Starting from the same dataset we’ve used for the previous visualisation we generated an Egonetwork using as central user one of the minor nodes of the global visualisation. We chose the user lucamondini who, at that time, had a rather small network: 150 following and 200 followers. The idea was to see how the network looked like when the observer was one of the peripheral nodes.

An explanation of nodes sizes and colours can be found here, what’s interesting is that switching the perspective to this user’s point of view the overall scenario change. Even if it is always possible to find major and minor nodes they don’t seem to be necessarily the same nodes of the global map. Of course users that are very popular within the whole network seem to be quite popular also within local egonetwork but their specific size is different. Moving our observation to nodes that are even more peripheral (we chose the user magicabula, 21 followers and 33 following) – since it has so few followers this user wasn’t included into the global visualisation – will show a small network of highly connected users where some of the users are heavily connected even in the global map and some have a large authority only within this local perspective.
When we’re dealing with social network and information propagation we must keep in mind that the scenario might be really different when it is observed from the far periphery of the network.

Visualising Italian Friendfeed Network

Recently we’ve posted on Friendfeed a visualisation of Italian Friendfeed users extracted from the data we’ve collected in 2009. Since the map started an interesting debate (you can read it here – in Italian –) we thought to write a short post explaining how the map has been done and what are its limits and possibilities.

The map is based on a network made of 8024 nodes with 244542 edges (even if the map shows only the nodes with more than 147 followers but statistical values have been counted on the whole network).
We collected the data in September 2009 starting from all the public messages posted on Frienfeed (you can read more about this in out SBP10 paper).
We processed the network with Gephi and the map shows the indegree value as node size and betweenness centrality value as node colour.
The final result is rather interesting since it shows on one side a group of huge nodes with many followers but at the same time it shows how the is no simple correlation between the number of followers and the betweenness centrality value. Since BC value is often used to identify relevant nodes or hidden hubs this can be read as the quality of your connections matters more than their number.
Nevertheless a final remark has to be done. Metrics like betweenness centrality works really well in traditional networks but they fail to grasp the new conversational nature of Friendfeed Network (but the same could be said about Twitter). In Friendfeed conversations exist often out of the network made by the following/follower structure. When I get in touch with a message originally produced out of my network only because a friend mine comments on it, what happens is the creation of an actual network based more on social behaviour than on the underlying set of connections.
We need new social metrics.

Mapping the Italian FriendFeed Network

Recentemente abbiamo pubblicato su FriendFeed una visualizzazione della rete degli utenti basata sui nostri dati dell’anno scorso. Siccome dalla mappa (che potete vedere qui sotto) è nata un’interessante discussione abbiamo pensato che fosse opportuno fornire qualche dettaglio su come è stata realizzata e su quali possono essere le indicazioni ed i limiti di questo genere di visualizzazioni.

La mappa si basa su una rete di 8024 utenti – identificati come italiani – con 244542 archi direzionati. In realtà la mappa visualizza solo gli utenti con un numero di follower maggiore di 147 (ma le statistiche sono state calcolate sull’intera rete).
La rete è stata raccolta nel Settembre del 2009 a partire dalle conversazioni prodotte dagli account pubblici (questo spiega l’assenza dalla mappa di quegli utenti con account privati). Per maggiori dettagli sul processo di estrazione dei dati è possibile vedere questo paper.
La rete è stata elaborata con Gephi (http://gephi.org) e mostra i link in ingresso (In Degree) come dimensione dei nodi ed il valore di Betweenness Centrality come intensità del colore.
La mappa che emerge è decisamente interessante perché mostra da un lato, come prevedibile, una serie di utenti molto seguiti (nodi molto grandi) ed un numero decisamente maggiore di utenti minori variamente connessi. Al tempo stesso la mappa mostra però una assenza di correlazione tra la dimensione del nodo (il numero di followers) e l’intensità del colore (la Betweenness Centrality). La Betweenness Centrality è una misura spesso utilizzata per individuare hub o nodi particolarmente significativi all’interno di una rete, misura infatti quanto spesso uno specifico nodo si trova nel “percorso più breve” tra due nodi della rete stessa.
Da questo punto di vista è possibile quindi notare che il numero di connessioni non è, da solo, garanzia di essere hub centrali nella rete (anche se, ovviamente, aiuta).
In conclusione è però necessario fare una piccola considerazione.
Metriche come quelle qui utilizzate sono standard nella Social Network Analysis e funzionano bene per descrivere delle reti tradizionali. Reti come FriendFeed (ma lo stesso vale per Twitter) hanno però alcune specificità rispetto alle modalità di propagazione delle informazioni che rendono queste metriche solo parzialmente descrittive di questi network. Quando su FriendFeed il meccanismo dei commenti o dei like mi porta in contatto con informazioni postate da contatti che io non seguo direttamente (e questo accade in continuazione) ecco che si realizza una connessione che la nostra mappa – costruita a partire dai follower e following espliciti – non riesce a visualizzare e della quale un valore come la Betweennes Centrality non tiene conto.
In altre parole le metriche tradizionali funzionano molto bene quando devono descrivere la rete fino a che la rete è immobile, quando la rete si anima e le relazioni sociali entrano in gioco abbiamo bisogno di nuove metriche.
Su queste nuove metriche stiamo lavorando e speriamo di riuscire a presentare qualcosa presto.

Twitter @reply Networks on UK General Elections #UKGE2010

Few days ago Axle Burns and the people from “Mapping online publics” posted a very interesting article about mapping the Australian election following #ausvote tweets. The idea behind that was rather good and simple: by mapping all the messages containing the conventional reply symbol (@username) one could map the conversational network surrounding a specific topic (defined by the #hashtag). Of course this methods has some limitations (clearly explained by Axle), nevertheless it can be use to produce a rough map of the conversational network.
Since some time ago we’ve downloaded (using Twapperkeeper – the same service used by Axel) all the tweets with the hashtag #ukge2010 (the “official” hashtag about Uk general elections) we have decided to do the same analysis on Uk tweets.
So we’ve got the @replies network and using gephi we counted the indegree and the betweenness centrality of the nodes. Following Axle we’ve also excluded from the visualisation itself any users who received fewer than 100 @replies.
Here you can see the result:

The size of the nodes represents the indegree value while the colour represents the betweenness centrality. This is the table showing the top values:

What can be easily noticed is that most important nodes within the conversational network are not the official twitter account from political parties or politicians. Bloggers, journalists and consultants like @carlmaxim or @bengoldacre get more direct replies than official twitter account from politicians – like @nick_clegg – or from political parties – like @UKLabour-.
This can be due to the use that those users make of Twitter, @nick_clegg or the @UKLabour probably are not perceived as someone you would reply or address directly.
At the same time it is important to highlight how there is no clear correlation between indegree and betweenness centrality, users like @bloggerheads have a high betweenness centrality value but a very low indegree. This can be surely due to a different behaviour of users (outdegree value is important in how betweenness centrality is defined) but at the same time I think that betweenness centrality, even if is a standard measure for sna, is unable to get the real complexity of a conversation network connected through a Twitter #hashtag (but we’re coming back on this point very soon).

SIGSNA goes HPC

We’ve been recently awarded of High Performance Computer resources by the CINECA – the Italian Consortium for high performance computing. We submitted an application a C class project (test and development) and now we we can use up to 20.000 CPU hours of the CINECA SP6 System.
This open a brand new scenario in our research with many interesting perspectives. The size of the network we are working with can be hardly managed with normal personal computers and even when it is possible it takes hours or even days of computation. Now we could move toward a new set of possibilities that so far have been out of our computational power. This could also move somehow the focus of our research adding some specific aspects of high performance computing and network theory.We’ve been recently awarded of High Performance Computer resources by the CINECA – the Italian Consortium for high performance computing. We submitted an application a C class project (test and development) and now we we can use up to 20.000 CPU hours of the CINECA SP6 System.
This open a brand new scenario in our research with many interesting perspectives. The size of the network we are working with can be hardly managed with normal personal computers and even when it is possible it takes hours or even days of computation. Now we could move toward a new set of possibilities that so far have been out of our computational power. This could also move somehow the focus of our research adding some specific aspects of high performance computing and network theory.