Tweprints Update

June 18, 2009 — Leave a comment

Back in April I launched a new project called arXiv on Twitter, or just ‘Tweprints’. This website collects the tweets that mention papers from the arXiv website (a pre-print server for scientific papers) and organises and presents them for the reader. My hope is that Tweprints will eventually begin to display the most talked-about scientific papers using the largest open collection of online papers available (arXiv) and the most prolific and popular open social networking tool (Twitter).

So far the results are interesting and as we have now passed the 500 tweets mark, I thought it would be nice to report on some facts and figures.

Overview

The site is generally working well and I am pleased with it. We current have 516 tweets (and this number has changed several times during my writing of this blog post!) which averages at about 8 tweets a day since it started. This number fluctuates quite a bit though, as the daily tweets graph below indicates.

The site will need to run for several months more before any really hard numbers can be drawn from the data. At the moment the odd popular paper can completely skew all the statistics. However in general, astronomy and general physics papers do best. Those papers which are accessible to a non-expert reader seem to do well, which makes perfects sense.

Usage

The website receives about 200 visitors each day with a lull at the weekends and a peak on Fridays. Conversely, the tweets themselves are predominantly collected on Tuesdays and Wednesdays, inline with statistics recently released about Twitter use in general [1].

Cumulative TweetsDaily Tweets

These two graphs plot the tweets over time. The first shows the cumulative number of tweets over time. The second shows the tweets each day since April 16th. The best fit to the cumulative graph is actually a slow, exponential growth, which is in keeping with the number of tweets being produced in general by Twitter. The daily graph indicates the same, although it is interesting to see the weekend lulls clearly marked out. In the future this chart can have other information overlaid such as news stories – you can see the Iran elections spike at the far right.

Categories

The most-tweeted broad subject areas are Computer Science (29%), Astrophysics (28%) and Physics (14%). This probably to be expected given the way that arXiv is used with different academic fields [2].

The two most tweeted sub-topics are ‘Instrumentation and Methods for Astrophysics’ and ‘Cosmology and Extragalactic Astrophysics’. Together they account for about 35% of the tweets and they each contribute roughly half of that total. After that ‘Physics and Society’ is very popular with 13% of the total tweets. ‘Cryptography and Security’ and ‘Applications’ (of Statistics) make up 11% and 13% respectively but it should be noted that most of those tweets are for only one, very popular, paper in each case.

Word Cloud

A curious sideline I set up is a word cloud of the words found in tweeted paper titles. I would like to know what words hook people in and get them to read a paper.So far this has shown limited effect but should grow more useful with time. A similar same cloud can be created for the abstracts of the papers and for the tweets themselves.

Word Cloud

Interesting Papers

A couple of papers have really rocketed through the ranks. Recently a paper described a statistical result suggesting that there was vote rigging in the Iranian elections. ‘Benford’s Law anomalies in the 2009 Iranian presidential election’ rose from nowhere to become the most tweeted arXiv paper since April in just two days, with 35 tweets at time of writing this post.

Hiding Information in Retransmissions’ is a Computer Science paper that has shown a slower, but steadier climb through the popularity ranks. It has now dropped out of the weekly top-ten but remains in the monthly and all-time charts.

Mavens and Connectors

The twitter users producing the most tweets have stayed roughly constant in the past 2 months. In order, the users contributing more than 10 arXiv tweets since April are currently @CharmQGP (48), @sarahkendrew (25), @heptwit (20), @orbitingfrog (15; me), @astroclif (14), @astrodicticum (11) and @cmetzner (11).

Interestingly of these, @heptwit and @astroclif have only 7 and 2 followers each on Twitter. This leads me to think that I should begin measuring the ‘reach’ of tweets as well as the number of tweets. For example, a paper tweeted by @CharmQGP will be seen by potentially 401 people. If it were then retweeted by @sarahkendrew it could reach up to 653! This would be compared to a paper tweeted by @heptwit@astroclif that can only be seen by 8 people. The reach of a tweet is a new metric that I can easily begin adding into the site.

In a similar vein, it is curious to measure the retweeting potential of users. We can reorder the top contributors listed above in order of their retweeting power – that is to say that we can place them in order of how many times people retweet the papers that they mention. Now the list looks different: @sarahkendrew (16), @orbitingfrog (13), @CharmQGP (3), @cmetzner (2), @heptwit (1), @astroclif (0) and @astrodicticum (0). This can only be taken as a tentative result since we are only 2 months into this data set and retweeting is far less common than simply tweeting in general. However it points out another metric that may be worth measuring.

@tweprints

As suggested many times by many people there is now a @tweprints twitter feed that announces popular papers as they reach a threshold number of mentions. This threshold is changeable but currently sits at 5 tweets.  I am open to suggestions for the kinds of things this twitter feed should announce. At the moment it is a very sparse feed.

@tweprints

Future Plans

The new metrics I mentioned above will find their way onto the site soon, as will an expanded statistics page. XML feeds for data and hopefully an RSS feed of the current top-ten papers. All of this is of course only going to get done in between real work on my thesis. Sheesh.

There is a large and valuable database building up behind this project and if anyone has any novel uses for it, I would be interested to hear from you. I will update again in the future when there is something worthy of noting. In the meantime: keep tweeting your arXiv papers!

[1 – Sysomos Twitter Report] [2 – SarahAskew] [Link to Tweprints]

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s