Conversation retrieval from Social Media

Next week Matteo is going to Dublin for the annual European Conference on Information Retrieval (ECIR). We are presenting a demo of our Conversation retrieval system for Social Media and Social Network Sites. While the addition of Social aspect to traditional online searches has been around for some times we are following a different approach. So far social search used what we can define as an ego-centric approach that means that informational objects around the web get somehow recommended by your online contacts.
We are doing something different. We are moving from online search aimed at retrieving information toward what we call a conversational search. This means that the object of our search is no longer a single information but a set of messages and users that can be described (and ranked) according to many social aspects.
Therefore some of the ranking criteria that can be used are:
Text relevance, User centrality (e.g., degree, page rank, audience), Message popularity (e.g., retweets, likes, sharing), Timeliness (i.e., distance from a given timestamp), Length (i.e., number of messages), Density (i.e., emotions and interest).
We’ve done some blind comparison between our system (tuned with different ranking parameters) and Google on some Friendfeed conversations searches. Users were asked to judge (according to their personal interest) a set of Friendfeed conversations about a specific topic. Here you can see the results (Google is green, The other two are our system with the ranking based on popularity [purple] and density [blue]), higher values mean higher a better judgement on results showed by the search system.

SIGSNA TwitterGet: a Twitter downloading tool

When we started our researches on Twitter propagation we, as many others, used the well known Twapperkeeper service in order to save and download Tweets. Later we moved toward an ad hoc solution which was highly customised for our specific needs and infrastructure.
Recently due to a change in Twitter policy Twapperkeeper removed the export and download capability, leaving many researchers without an important research tool.
We then decided to share a simplified version of our system that should be able to run properly on many systems. You can download it from the tools page. The attached manual should explain everything you need to know to install and use the system.
Please note that this is an alpha version so please give us feedback or request for new functions.
We really hope this will help many researchers looking for a (relatively) simple to installa alternative to Twapperkeeper.

What if Twitter wasn't the fastest one…

[Italiano] We have recently done some comparative analysis between Twitter propagation dynamics and FriendFeed propagation dynamics. We chose, as our case study, the news related to the rescue operations of the San Josè mining accident (that left 33 men trapped 700 metres (2,300 ft) below ground for 69 days) [I’m not describing the research now, we have a paper under submission.. so stay tuned for more about the research itself].
As a side product of this research we had the opportunity to monitor the audience exposed to the miner’s rescue news both on Friendfeed and on Twitter. We were therefore able to observe how fast a specific news spreaded through both the networsk and, since we are observing the same news, we can assume that different propagation speeds can be related to the different propagation mechanisms taking place into the two systems.
Even considering the huge difference in absolute numbers (Twitter has a larger number of users) the line of FriendFeed based propagation (Esposti FF) is steeper and shows a less linear progression than Twitter’s line (Esposti TW).


This seems to suggest that a propagation based largely on the interactions made by the people you follow is faster than a propagation based mostly on explicit re-sharing practices (ReTweets).[English]

Come effetto collaterale di una ricerca che abbiamo da poco concluso sulla propagazione online delle news relative al salvataggio dei minatori chileni intrappolati nella miniera di San Jose abbiamo potuto verificare la velocità di propagazione di una notizia all’interno del network di Friendfeed e di Twitter. Dato che la notizia di partenza era la stessa possiamo ipotizzare che le differenze di propagazione osservate siano imputabili ai meccanismi di propagazione dei due sitstemi. Anche tenendo in considerazione l’enorme differenza in termini numerici la curva della propagazione di FriendFeed appare molto più ripida e meno lineare di quella di Twitter.

Questa differenza sembrerebbe indicare come un meccanismo di propagazine basato sulle interazioni dei propri contatti (come avviene su FriendFeed) piuttosto che su esplicite pratiche di propagazione risulti essere più efficace dal punto di vista della velocità.

FriendFeed Under A Microscope

[Italiano]
A group of Students from the School of Information and the School of Public Health of the Univeristy of Michigan have done some very nice visualisation work using our latest dataset. We’ve been in touch with Guangming Lang for some time to explain them the data structure and few insights and then they did the whole job.
The visualisations focus on sic questions: Where do FriendFeed users come from? What are the sources for entries on FriendFeed? How engaged are FriendFeed users? How engaged are FriendFeed users as time passes? What is most talked about in FriendFeed? and How do the top 90 most followed users gain followers?
Well, enjoy this very nice work.[English]
Un gruppo di Studenti della School of Information e della School of Public Health dell’Università del Michigan hanno appena pubblicato alcune visualizzazioni basate sul nostro dataset di FriendFeed.. Il tutto è iniziato qualche tempo fa quando Guangming Lang ci ha contattato chiedendoci alcune informazioni sulla struttura dei nostri dati che aveva appena scaricato e, dopo qualche tempo, ecco il risultato del lavoro.
Le visualizzazioni rispondono ad alcune domande molto interessanti:
– Where do FriendFeed users come from?
– What are the sources for entries on FriendFeed?
– How engaged are FriendFeed users? How engaged are FriendFeed users as time passes?
– What is most talked about in FriendFeed?
– How do the top 90 most followed users gain followers?
Insomma, veramente un ottimo lavoro!

When followers are not enough

[italian version]
We have just gathered a brand new datased of FriendFeed data (you can download it in the Data section, it’s named 2010-a dataset ). Since it is considerably larger than our previous database we decided to test few more hypotesis on information propagation in SNSs. One of the key concepts speaking about the ability to spread online information is that being well connected is a key element in propagation strategies. This point can be roughly summarised as: the more followers you have the more you can inform. We’ve already challenged this assumption before and we wanted to test it deeper.
Therefore we analysed the relationship between the actual number of followers and the average audience of every users. We defined the average audience value as the average number of users been exposed to the messages sent by a specific user during our sample time. Due to the technical structure of Friendfeed users that were able to start the most engaging discussions have a larger opportunity of have an actual audience larger that the simple list of their followers.

Followers /Avg Users

As it is shown in the graph – that shows only the top 20 users according to their followers number – there could be a huge difference between the followers and the actual audience that users can engage. It is very interesting to point out how the users with a larger average audience is ranked only 18th according to the followers number.
As we said before, when we’re dealing with social phenomena and users engagement (as it happens in online propagations): followers are not enough.[English version]
Partendo dall’ultimo dataset che abbiamo acquisito con i dati di FriendFeed abbiamo iniziato a testare alcune ipotesi relative alla possibilità di definire la capacità comunicativa degli utenti all’interno di questo tipo di reti. Una delle assunzioni che si sono fatte più spesso (più in passato di quanto non avvenga ora) riguarda il nuero di followers. In pratica si considera spesso questo valore come un indicatore della capacità comunicativa di un utente. Brutalmente si pensa che se una persona è in contatto con molte altre persone questi abbia la capacità di raggiungere una massa importante di utenti.
Per verificare questo assunto abbiamo deciso di osservare la relazione tra il numero di follwers e la audience media degli utenti. Con audience media intendiamo il numero di utenti che sono stati esposti ai messaggi postati da uno specifico utente durante il nostro periodo campione (2 Mesi: Agosto- Settembre 2010).
Followers /Avg Users

Data la natura di FriendFeed l’audience tenderà a crescere verso valori più ampi rispetto ai follower diretti tanto più l’utente sarà in grado di far partire discussioni che riescono a propagarsi ed a coinvolgere gli amici degli amici e così via.
Come si può vedere dall’immagine [che mostra il rapporto tra followers e audience media per i 20 utenti con il maggior numero di followers all’interno della rete di FriendFeed italiana (solo account pubblici)] un elevato numero di followers non significa necesariamente un’elevata audience media, anzi l’utente che – in termini assoluti – raggiunge mediamente un’audience maggiore si colloca solo diciottesimo quando andiamo a contare i followers.
Insomma ancora una volta quando parliamo di reti sociali i numeri possono ingannare facilmente.

IR11: Sustainability, Participation, Action

Last week I’ve been at the IR11 Conference (the annual conference of the association of Internet researchers) where I presented the SIGSNA research on the propagation of information in Friendfeed – starting from the analysis of the death of Mike Bongiorno – [see slides below].
IR11 was a great experience and aoir is probably one of the best academic group around within the area of internet studies. The overall quality of the conference was really high (special thanks to Torill – the program chair – for her great work ) and I had a chance to meet many great people doing incredible researches.
Among those I really would like to point out the Mapping Online Publics (at the Queensland University of Technology) and the Retro-V project (at the University of Washington). Check out what these guys are doing!

To whom are you speaking? Egonetwork over time

Last version of Gephi introduced some very nice feature. It is now possible to work with dynamic networks that can easily be observed in their evolution. Working with dynamic networks is crucial when you are dealing with social networks, like those existing in microblogging sites, that show a high level of variability: social connections quickly change over time and – even if the connection does not disappear – the use of a specific connection can be very different from time to time. Observing such a phenomenon could be difficult with a static SNA but with a dynamic perspective it becomes quite simple.

The movie shows how the egonetwork of my Friendfeed user changed within the period Aug. – Sept. 2010. The ego-node represents my user and the other nodes are all the users I’ve interacted with (on FriendFeed). Nodes with a higher level of interaction are visualised closer to the ego-node while users with a low level of interaction are pushed away from the ego-node.
The video span over two month of time with the data-resolution set at 10 days (this creates 5 different configuration of the network) and it clearly shows how closer nodes change even in such a short amount of time.
Even if this was intended to be just a demo of then new opportunities offered by gephi it provides some insights about how ego-networks evolve over time. This evolution can be due to endogenous or exogenous aspects but it seems to be quicker than what one could expect.L’ultima versione di Gephi permette finalmente la visualizzazione di reti dinamiche. Quella che trovate qui è una breve visualizzazione di come l’ego-network attiva di uno specifico utente (in questo caso il mio utente FriendFeed) è cambiata nel corso dei mesi di Agosto-Settembre 2010.

I nodi sono posizionati – rispetto al nodo centrale – in modo da rappresentare il livello di interazione: i nodi più prossimi sono quelli con un livello di interazione maggiore. Com’è possibile vedere i nodi più prossimi – ovvero i nodi con i quali in quel periodo ho interagito maggiormente – cambiano con un’elevata frequenza costringendo la rete a riadattarsi di conseguenza.
Oltre alla dimostrazione di alcune delle possibilità offerte dalla nuova versione di gephi questa breve visualizzazione ci permette di capire come i contatti con i quali interagiamo – pur all’interno di un numero sicuramente interiore rispetto all’insieme delle connessioni possibili – sono una realtà dinamica in costante evoluzione. Le ragioni di questa evoluzione possono essere le più diverse, da fattori endogeni alla rete (discussioni interessanti) a fattori esogeni (eventi esterni che desideriamo trattare con alcuni contatti).

Mapping FriendFeed Network: switching the perspective

Recently we’ve posted a visualisation of the Italian Network of FriendFeed. Such a map was an interesting and general perspective showing how a complex network can be visualised. Obviously when we’re dealing with social networks or microblogging sites we’re dealing with a complex network emerging from many different egonetworks. How is the perspective over the same Network (Italian Friendfeed Users) if we observe  it from the inside?
Starting from the same dataset we’ve used for the previous visualisation we generated an Egonetwork using as central user one of the minor nodes of the global visualisation. We chose the user lucamondini who, at that time, had a rather small network: 150 following and 200 followers. The idea was to see how the network looked like when the observer was one of the peripheral nodes.

An explanation of nodes sizes and colours can be found here, what’s interesting is that switching the perspective to this user’s point of view the overall scenario change. Even if it is always possible to find major and minor nodes they don’t seem to be necessarily the same nodes of the global map. Of course users that are very popular within the whole network seem to be quite popular also within local egonetwork but their specific size is different. Moving our observation to nodes that are even more peripheral (we chose the user magicabula, 21 followers and 33 following) – since it has so few followers this user wasn’t included into the global visualisation – will show a small network of highly connected users where some of the users are heavily connected even in the global map and some have a large authority only within this local perspective.
When we’re dealing with social network and information propagation we must keep in mind that the scenario might be really different when it is observed from the far periphery of the network.