Archives For open science


Executable papers are a cool idea in research [1]. You take a study, write it up as a paper and bundle together all your code, scripts and analysis in such a way that other people can take the ‘paper’ and run in themselves. This has three main attractive features, as I see it:

  1. It provides transparency for other researchers and allows everyone to run through your working to follow along step-by-step.
  2. It allows your peers to give you detailed feedback and ideas for improvements – or do the improvements themselves
  3. It allows others to take your work and try it out on their own data

The main problem is that these don’t really exist ‘in the wild’, and where they do they’re in bespoke formats even if they’re open source. iPython Notebook is a great way of doing something very much like an executable paper, for example. Another way would be to bundle up a virtual machine and share a disk image. Executable papers would allow for rapid-turnaround science to happen. For example, let’s imagine that you create a study and use some current data to form a theory or model. You do an analysis and create an executable paper. You store that paper in a library and the library periodically reruns the study when new data become available [2]. The library might be a university library server, or maybe it’s something like the arXiv, ePrints, or GitHub.

This is roughly what happens in some very competitive fields of science already – only with humans. Researchers write papers using simulated data and the instant they can access the anticipated data the import, run and publish. With observations of the Cosmic Microwave Background (CMB) it is the case that several competing researchers are waiting to work on the data – and new data come sour very rarely. In fact that day after the Planck CMB data was released last year, there was a flurry of papers submitted to the arXiv. Those who got in early, likely had pre-written much of the work and simply ran their code as soon as they had downloaded and parsed new, published data.

If executable papers could be left alone to scan the literature for new, useful data then they could also look for new results from each other. A set of executable papers could work together, without planning, to create new hypotheses and new understanding of the world. Whilst one paper crunches new environmental data, processing it into a catalogue, another could use the new catalogue to update climate change models and even automatically publish significant changes or new potential impacts for the economy.

I should be possible to make predictions in executable papers and have them automatically check for certain observational data and automatically republish updated results. So one can imagine a topical astronomy example where the BICEP2 results would be automatically checked against any released Planck data and then create new publications when statistical tests are met. Someone should do this if they haven’t already. In this way, papers can continue to further, or verify, our understanding long after publication.

SKA Rendering (Wikimedia Commons)

SKA Rendering (Wikimedia Commons)

This is high-frequency science [3], akin to high-frequency trading, and it seems like an interesting approach to some upcoming data-flow issues in science. The Large Hadron Collider (LHC), Large Synoptic Survey Telescope) LSST, and Square Kilometre Array (SKA) are all huge scientific instruments set to explore new parts o the universe and gathering huge volumes of data to be analysed.

Even the deployment of Zooniverse-scale citizen science cannot get around the fact that instruments like the SKA will create volumes of data that we don’t know what to do with, at a pace we’ve never seen before. I wonder if executable papers, set to scour the SKA servers for new data, could alleviate part of the issue by automatically searching for theorised trends. The papers would be sourced by the whole community, and peer-reviewed as is done today, effectively crowdsourcing the hypotheses through publications. This cloud of interconnected, virtual researchers, would continuously generate analyses that could be verified by some second peer-review process; since one would expect a great deal of nonsense in such a setup.

When this came up at a meeting the other day, Kevin Page (OeRC) remarked that we might just be describing sensors. In a way he’s right – but these are software sensors, built on the platform and infrastructure of the scientific community. They’re more like advanced tools; a set of ghost researchers, left to think about an idea in perpetuity, in service of the community that created them.

I’ve no idea if I’m describing anything real here – of it’s just an expression of way of partially automating the process of science. The idea stuck with me and I found myself writing about it to flesh it out – thus here is a blog post – and wondering how to code something like it. Maybe you have a notion too. If so, get in touch!


[1] But not a new one really. It did come up again at a recent Social Machines meeting though, hence this post.
[2] David De Roure outlined this idea quite casually in a meeting the other day, I’ve no ice air it’s his or just something he’s heard a lot and thought was quite cool.
[3] This phrasing isn’t mine, but as soon as I heard it, I loved it. The whole room got chatting about this very quickly so provenance was lost I’m afraid.

Science Blogging

September 3, 2011 — Leave a comment

I’ve been at Science Online 2011 in London this weekend. One hot topic of conversation during Day One was science blogging and how it relates to science publishing in the form of journals.

There was much hand-wringing yesterday, during a panel discussion on the ‘Arsenic Life’ story (see these links), where science bloggers seemed exasperated by the fact that what they write in blogs is not linked with the research they discuss. After all they write some great stuff and it would be great if anyone reading the paper could read their dire warnings about the reliability of the conclusions*. At moments some on the the panel and in the audience even seemed to get close to suggesting that their blog posts should be placed on a level with peer-reviewed publications. 

The ‘Arsenic Life’ tale is one where several things than could go wrong, did go wrong. The results were over-hyped, the scientists were unresponsive to criticism and the peer-review system broke down. However the vast majority of scientific results are reported pretty well and without such catastrophic failures of the system. Did blogging help? Yes and it may have been instrumental in bringing the issues to light. The scientists who wrote about the story on their blogs, did so in a journalistic act, not a scientific one. They are free to publish rebuttal papers and get a peer-reviewed response into the literature in due course. I’d be keen to know if anyone is doing this.

I think what many science bloggers forget is that they represent the very thin end of the science-blogging wedge. There are many research scientists out there writing blogs about their work and that of others. They share that part of the blogosphere with many more science bloggers who are non-scientists. Many interested amateurs and members of the public are writing about science and some are doing a very good job of it. (Some researchers do a bad job it too, by the way).

Whilst I agree with many that we should move to a more open and transparent publication process in academia, I don’t believe that blogging should be part of it – certainly not in its current form. Blogging represents a free and liberating way to share ideas and thoughts. It is unencumbered by regulation and this is exactly why I think many scientists enjoy it and find it useful. Perhaps one tantalising aspect of science blogging is that it feels like scientific lab reporting for many people – but it isn’t. You may chose to write your blog with all the rigour and finesse of a publishable work but it is still a blog. I suppose it comes down to trust and verifiability.

One can imagine ways to legitimise and promote blogging into a state closer to the academic model (without turning it into journalism). Something more akin to social networking than peer-review seems like a good idea. I would point anyone thinking about these ideas to the Research Blogging network which is collecting blogs about peer-reviewed research. Perhaps a blog journal, with editors and peer-review would be viable – does such a thing exist?   

Science blogging is growing but the credibility of the few should not be used to elevate the many non-professional science blogs to a recognised, academic status. Science bloggers are doing something great: they are providing insight into the way science works and telling a more narrative story about their results and their field of work.

Another of yesterday’s sessions was about storytelling (It was hosted by @BoraZ and @mistersugar) – one theme emerging was that more scientists need to tell stories to help engage people and this is a crucial point. The science bloggers are acting outside of the scientific process and telling their own stories. This is a great thing to do and it doesn’t need to get incorporated into that process. It is great because it is distinct and unrestrained. I say let the science blogging continue on all sides, and in all forms and leave it separate from the process of peer-review and publication. there is no need to further muddy those waters.

*The idea of using trackbacks to allow bloggers to connect with things they discuss is not new, and you can in fact trackback to papers (e.g. on the arXiv: if you blog about them.