SIGSNA TwitterGet: a Twitter downloading tool

When we started our researches on Twitter propagation we, as many others, used the well known Twapperkeeper service in order to save and download Tweets. Later we moved toward an ad hoc solution which was highly customised for our specific needs and infrastructure.
Recently due to a change in Twitter policy Twapperkeeper removed the export and download capability, leaving many researchers without an important research tool.
We then decided to share a simplified version of our system that should be able to run properly on many systems. You can download it from the tools page. The attached manual should explain everything you need to know to install and use the system.
Please note that this is an alpha version so please give us feedback or request for new functions.
We really hope this will help many researchers looking for a (relatively) simple to installa alternative to Twapperkeeper.

#rifarelitalia dynamic ReTweet Chain


[Italian]
Per Rifare l’Italia” was a conference hold in the Italian Lower Chamber (Campera dei Deputati) on Feb. 2 2011, organised by Telecom Italia – Working Capital about the strategies to bring a higher level of innovation in Italy (the hashtag sound like “rebuilding Italy). Keynote speaker was Edmund S. Phelps (winner of the 2006 Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel), You can read the full transcript of his speech here.
Following the official hashtag we have done this short animation that clearly show the dynamic evolution of the ReTweets related to the event. It is interesting to point out that while during the conference there was a good level of activity (most due to a numbers of blogger live tweetting the event) very few interactions happens during the afternoon.
From a technical poing of view the video (made with Gephi) shows only the ReTweet chains of the messages with the #rifarelitalia official hashtag. Both size and colour of the nodes indicate the Degree centrality of the node.
[English Version]
“Per Rifare l’Italia”, evento organizzato il 2 Febbraio 2011 da Telecom Italia presso la Camera dei Deputati, ha visto la partecipazione del premio Nobel per l’economia Edmund S. Phelps. La visualizzazione che qui proponiamo – basata sui tweet contenenti l’hashtag ufficiale #rifarelitalia – mostra le dinamiche di propagazione dei messaggi durante l’evento (decisamente elevate) e – al tempo stesso – come al termine dell’evento gran parte della propagazione in rete tenda ad esaurirsi.
Da un punto di vista tecnico il video – realizzato con Gephi – mostra la rete dei ReTweet (l’aggiustamento è dei nodi è ottenuto tramite l’applicazione di un Force Atlas Layout) su intervalli temporali di un’ora. Sia le dimensioni che l’intensità di colore dei nodi sono calcolare sulla base della degree centrality del nodo stesso.

What if Twitter wasn't the fastest one…

[Italiano] We have recently done some comparative analysis between Twitter propagation dynamics and FriendFeed propagation dynamics. We chose, as our case study, the news related to the rescue operations of the San Josè mining accident (that left 33 men trapped 700 metres (2,300 ft) below ground for 69 days) [I’m not describing the research now, we have a paper under submission.. so stay tuned for more about the research itself].
As a side product of this research we had the opportunity to monitor the audience exposed to the miner’s rescue news both on Friendfeed and on Twitter. We were therefore able to observe how fast a specific news spreaded through both the networsk and, since we are observing the same news, we can assume that different propagation speeds can be related to the different propagation mechanisms taking place into the two systems.
Even considering the huge difference in absolute numbers (Twitter has a larger number of users) the line of FriendFeed based propagation (Esposti FF) is steeper and shows a less linear progression than Twitter’s line (Esposti TW).


This seems to suggest that a propagation based largely on the interactions made by the people you follow is faster than a propagation based mostly on explicit re-sharing practices (ReTweets).[English]

Come effetto collaterale di una ricerca che abbiamo da poco concluso sulla propagazione online delle news relative al salvataggio dei minatori chileni intrappolati nella miniera di San Jose abbiamo potuto verificare la velocità di propagazione di una notizia all’interno del network di Friendfeed e di Twitter. Dato che la notizia di partenza era la stessa possiamo ipotizzare che le differenze di propagazione osservate siano imputabili ai meccanismi di propagazione dei due sitstemi. Anche tenendo in considerazione l’enorme differenza in termini numerici la curva della propagazione di FriendFeed appare molto più ripida e meno lineare di quella di Twitter.

Questa differenza sembrerebbe indicare come un meccanismo di propagazine basato sulle interazioni dei propri contatti (come avviene su FriendFeed) piuttosto che su esplicite pratiche di propagazione risulti essere più efficace dal punto di vista della velocità.

Building your Twitter replynet with SQLite

Data preparation is one of the most important and interesting part of every research activity. Recently I’ve really enjoyed Axel’s posts about how to extract conversation networks out of Twapperkeeper archives. Axel uses awk which is a very powerful and flexible tool to manipulate csv files. Obviously there are several ways to manipulate a csv (or any kind of data) in order to extract a network file readable by a SNA software such as Gephi. We at SIGSNA have always used a database approach. This is probably due to the high expertise of Matteo with databases but I must admit that it seems to have quite a few advantages.
With this post – the first of a more methodological oriented series – I’ll show how to extract a conversation network (aka Reply Network) out of a Twapperkeeper archive using SQLite. Please note that this will be a non technical post and it i intended for a non technical audience.
SQLite is a cross platform embedded relational database management system.
(On osx you can start it simply typing sqlite3 from your command shell).
First we have to create a table suitable for importing a twapperkeeper archive:
.create table tablename (text varchar,
to_user_id int,
from_user varchar,
id int,
from_user_id int,
iso_language_code varchar,
source varchar,
profile_image_url varchar,
geo_type float,
geo_coordinates_0 float,
geo_coordinates_1 float,
created_at timestamp,
time long);

once we’ve created the table we just have to import the .csv file into our table
.separator ","
.import filename.csv tablename

Once we’ve imported our data we’ve got the full advantages of a powerful database and creating our conversation network it’s really simple.
Let’s create a csv file with a line for every @reply in our data with the user Id of the sender and the user id of the receiver.
First we have to set a file as the output of our query:
.output nameoutyournetwork.csv
then we can run the query:
select from_user_id,to_user_id from yourtable where text like '@%' and to_user_id <> ' ';
after that we can set the output back to normal (on screen) mode:
.output stdout
And that’s all: you’ll find a csv file of the conversation network in your working directory.
Of course once you’ve imported your data into a database you can perform much more complex queries e.g. filter them according to the declared language:
select from_user_id,to_user_id from yourtable where text like '@%' and iso_language_code ='it';
Obviously this approach will provide a Reply network with userID instead on UserNames and this can be good during the research phases but we could prefer to have readable usernames for our final visualisations.
We’re going into this problem and we’ll see how to solve it with some awk scripting in our next post.[articolo disponibile solo in inglese]