Archives For Open Science

Since 2008 I have been running .Astronomy, which is a meeting/hackathon/unconference that aims to be better than normal meetings and to foster new ideas and collaborations. It’s a playground for astro geeks that is more specific than a general hack day, but way more freeform that a normal astronomy meeting. At .Astronomy we have developed into an amazing community.

I know people that have gotten jobs because of .Astronomy, changed careers because of .Astronomy – or even left astronomy because of .Astronomy (in a good way!). We have evolved into an interesting group, with a culture and way of thinking that we take back to our ‘real’ jobs after each event.

In short: it works. Now I’d like to work out how to spread the idea into more academic fields. We’re looking for people in other research areas, such as economics, maths, chemistry, medicine and more.

Adler Planetarium

I have funding from the Alfred P. Sloan Foundation to bring a handful of non-astronomers to this year’s .Astronomy, in Chicago at the amazing Adler Planetarium (December 8-10). The aim is to meet up at the end, and discuss whether you think it could work in your own field, and what you’d need to make that happen. If you’re a researcher, who isn’t an astronomer, and you think this sounds great then that could be you! We have funding to pay for flights, hotels and expenses. It will be a lot of fun – and despite the astronomy focus of the event, I think most researchers, with a bit of tech experience, would get a lot out of it.

If you’re interested then fill out the short form at http://bit.ly/dotastromulti or email me on rob@dotastronomy.com for more information. We are following a formal selection process, but we’re doing it very quickly and will decide by Nov 7th, to allow enough time ahead of the event to make travel plans and such. So don’t delay – do it now!

If you don’t think you’re the right person for this, then maybe you know who could be. If so, let them know and send them to http://dotastronomy.com/about/astronomy-6-multidisciplinary-program/ for more information.

The latest issue of Astronomy & Geophysics includes an article by your truly about the GitHub/.Astronomy Hack Day at the UK’s National Astronomy Meeting in Portsmouth earlier this year.

The projects resulting from hack days are often prototypes, or proof-of-concept ideas that are meant to grow and expand later. Often they are simply written up and shared online for someone else to take on if they wish. This ethos of sharing and openness was evident at the NAM hack day, when people would periodically stand up and shout to the room, asking for anyone with skills in a particular area, or access to specific hardware.

Take a look here: http://astrogeo.oxfordjournals.org/content/55/4/4.15.full?keytype=ref&ijkey=kkvGWSg3ABbIy5S

Martian Nyan Cat

Martian Nyan Cat

publications

Executable papers are a cool idea in research [1]. You take a study, write it up as a paper and bundle together all your code, scripts and analysis in such a way that other people can take the ‘paper’ and run in themselves. This has three main attractive features, as I see it:

  1. It provides transparency for other researchers and allows everyone to run through your working to follow along step-by-step.
  2. It allows your peers to give you detailed feedback and ideas for improvements – or do the improvements themselves
  3. It allows others to take your work and try it out on their own data

The main problem is that these don’t really exist ‘in the wild’, and where they do they’re in bespoke formats even if they’re open source. iPython Notebook is a great way of doing something very much like an executable paper, for example. Another way would be to bundle up a virtual machine and share a disk image. Executable papers would allow for rapid-turnaround science to happen. For example, let’s imagine that you create a study and use some current data to form a theory or model. You do an analysis and create an executable paper. You store that paper in a library and the library periodically reruns the study when new data become available [2]. The library might be a university library server, or maybe it’s something like the arXiv, ePrints, or GitHub.

This is roughly what happens in some very competitive fields of science already – only with humans. Researchers write papers using simulated data and the instant they can access the anticipated data the import, run and publish. With observations of the Cosmic Microwave Background (CMB) it is the case that several competing researchers are waiting to work on the data – and new data come sour very rarely. In fact that day after the Planck CMB data was released last year, there was a flurry of papers submitted to the arXiv. Those who got in early, likely had pre-written much of the work and simply ran their code as soon as they had downloaded and parsed new, published data.

If executable papers could be left alone to scan the literature for new, useful data then they could also look for new results from each other. A set of executable papers could work together, without planning, to create new hypotheses and new understanding of the world. Whilst one paper crunches new environmental data, processing it into a catalogue, another could use the new catalogue to update climate change models and even automatically publish significant changes or new potential impacts for the economy.

I should be possible to make predictions in executable papers and have them automatically check for certain observational data and automatically republish updated results. So one can imagine a topical astronomy example where the BICEP2 results would be automatically checked against any released Planck data and then create new publications when statistical tests are met. Someone should do this if they haven’t already. In this way, papers can continue to further, or verify, our understanding long after publication.

SKA Rendering (Wikimedia Commons)

SKA Rendering (Wikimedia Commons)

This is high-frequency science [3], akin to high-frequency trading, and it seems like an interesting approach to some upcoming data-flow issues in science. The Large Hadron Collider (LHC), Large Synoptic Survey Telescope) LSST, and Square Kilometre Array (SKA) are all huge scientific instruments set to explore new parts o the universe and gathering huge volumes of data to be analysed.

Even the deployment of Zooniverse-scale citizen science cannot get around the fact that instruments like the SKA will create volumes of data that we don’t know what to do with, at a pace we’ve never seen before. I wonder if executable papers, set to scour the SKA servers for new data, could alleviate part of the issue by automatically searching for theorised trends. The papers would be sourced by the whole community, and peer-reviewed as is done today, effectively crowdsourcing the hypotheses through publications. This cloud of interconnected, virtual researchers, would continuously generate analyses that could be verified by some second peer-review process; since one would expect a great deal of nonsense in such a setup.

When this came up at a meeting the other day, Kevin Page (OeRC) remarked that we might just be describing sensors. In a way he’s right – but these are software sensors, built on the platform and infrastructure of the scientific community. They’re more like advanced tools; a set of ghost researchers, left to think about an idea in perpetuity, in service of the community that created them.

I’ve no idea if I’m describing anything real here – of it’s just an expression of way of partially automating the process of science. The idea stuck with me and I found myself writing about it to flesh it out – thus here is a blog post – and wondering how to code something like it. Maybe you have a notion too. If so, get in touch!

———-

[1] But not a new one really. It did come up again at a recent Social Machines meeting though, hence this post.
[2] David De Roure outlined this idea quite casually in a meeting the other day, I’ve no ice air it’s his or just something he’s heard a lot and thought was quite cool.
[3] This phrasing isn’t mine, but as soon as I heard it, I loved it. The whole room got chatting about this very quickly so provenance was lost I’m afraid.

A new Milky Way Project paper was published to the arXiv last week. The paper presents Brut, an algorithm trained to identify bubbles in infrared images of the Galaxy.

bubble_gallery_sorted_v2

Brut uses the catalogue of bubbles identified by more 35,000 citizen scientists from the original Milky Way Project. These bubbles are used as a training set to allow Brut to discover the characteristics of bubbles in images from the Spitzer Space Telescope. This training data gives Brut the ability to identify bubbles just as well as expert astronomers!

The paper then shows how Brut can be used to re-assess the bubbles in the Milky Way Project catalog itself, and it finds that more than 10% of the objects in this catalog are really non-bubble interlopers. Furthermore, Brut is able to discover bubbles missed by previous searches too, usually ones that were hard to see because they are near bright sources.

At first it might seem that Brut removes the need for the Milky Way Project –  but the ruth is exactly the opposite. This new paper demonstrates a wonderful synergy that can exist between citizen scientists, professional scientists, and machine learning. The example outlined with the Milky Way Project is that citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

We’re really happy with this paper, and extremely grateful to Chris Beaumont (the study’s lead author) for his insights into machine learning and the way it can be successfully applied to the Milky Way Project. We will be using a version of Brut for our upcoming analysis of the new Milky Way Project classifications. It may also have implications for other Zooniverse projects.

If you’d like to read the full paper, it is freely available online at at the arXiv – and Brut can found on GitHub.

[Cross-posted on the Milky Way Project blog]

For better or worse citizen science has become a fashionable term, but what is it and why do people like it? Citizen Science is a big component in a larger movement of public participation and engagement. There are makers and hackers everywhere and participation in science feels like it is increasing in general. This is great, and means citizen science is of growing importance.

zooniverse_logo_wide

I work at the Zooniverse. We have a community of more than 850,000 people, who have taken part in more than 20 citizen science projects over the years. You can see the current batch at zooniverse.org. Last year the Zooniverse received more than 50 years of human effort (and that wasn’t our highest year so far) and our sites span a wide a range of scientific subjects. People seem to really like them. They’re well-designed and thought-through. They aim to produce real results and slowly but surely they are doing just that (see zooniverse.org/publications).

At a recent event on citizen science in education, hosted by the British Science Association, I was invited to speak about what citizen science is. This was actually quite difficult. There are projects out there that call themselves citizen science but which I would instinctively say were not so – and probably vice-versa. For example, I don’t think that downloading a screensaver to process someone else’s data is citizen science. Recently I’ve found myself debating the particulars with people. If I had to try to define it, I’s say that

Citizen science is a contribution by the public to research, actively undertaken and requiring thoughtful action.

The Zooniverse is about breaking down tasks into understandable components that anyone can perform. We rarely abstract the problem and always try to keep context in frame. You can know (if you want to) that you’re classifying galaxies or cancer cells or ancient papyri and you can also know why. Citizen Science projects often involve non-professionals taking part one or more of the following:

  • Crowdsourcing
  • Mass-participation
  • Data collection (only one we don’t do yet)
  • Data analysis

Fold.it and Eyewire are both excellent examples of crowdsourced data-analysis (much like the Zooniverse) and eBird is a great crowdsourced data collection project. The new Randomise Me site allows you to set up a mass-participation data collection project. In all these cases, people know what they’re taking part in. The Blackawton Bees paper is a perfect example of citizen science that wasn’t based on mass-participation or crowdsourcing – but was both data collection and analysis (by kids!). All are fabulous examples of citizen science.

At the very least: citizen science has to involve ‘citizens’ or volunteers. Over the past few years at the Zooniverse we’ve learned a lot about our volunteers, and why they take part and give up their time. We’ve learned that, above all, people want to make a contribution to science. I think it’s easy to understand why people want to make a meaningful donation of their time and I think it’s heartening that this is the case. We have learned that on the web, participation is more unequal than the least equal societies in the real world, with the distribution of effort in our own projects being comparable to projects like Wikipedia or Twitter or countless others. This means that most users do little and some users do staggering amounts but that this is fine online. We have also learned that scale is relevant. Sometimes you need 500 people, sometimes you need 500,000. You should know before you embark on your project.

The aim of citizen science ought to be to undertake research and discovery. That is surely wrapped up in its definition as a subset of science. It is not outreach or education – which our sites are often confused with in academia. The goal of outreach and education are to inform and teach, and in many cases citizen science can be used as a tool to do so. That intersection fascinates me but I’m not an educator and I’m starting to think that it is only educators who are able to successfully bring this stuff into the classroom, lecture theatre or tutorial. But that’s another post altogether [1].

Above all we’ve learned that you don’t just launch projects and cross your fingers; it’s 2013: that time has passed. The web is a sophisticated place and an awesome citizen science site can go far and do a lot of good work. Sadly it is also possible for a site to attract a lot of attention (and clicks) but never do anything useful at all. Of paramount importance is the concept of authenticity. Genuine participation in science is essential in an era when such a thing is possible. Our mantra at the Zooniverse is that we should never waste people’s time. Now that it has been convincingly shown that the public can contribute to research via the web, it is incumbent on new web-based projects to keep the bar raised and the standard high.

We are at the beginning a citizen science renaissance online. After hundreds of years as the purview of bug-collectors and bird-watchers (all very important work, I hasten to add), we are finally able to tap into the cognitive surplus [2] and attempt truly distributed research. I’m looking forward to seeing how it can be taken to the next level – and hopefully to being a part of it.

[1] or just take a look at Zoo Teach as an example of facilitating educators rather than asking them to use your own materials.

[2] FWIW I actually prefer Shirky’s ‘Here Comes Everybody

ESA’s Planck mission reported results today showing the Cosmic Microwave Background (CMB, see below) in greater detail than ever before.

image

Planck achieves this amazing view of the earliest light in the Universe by combining and cleverly cross-matching data across a combination of 9 different frequencies, ranging from 30-857 GHz. In this way they can remove foreground emissions and effectively strip away the content of the whole Universe, to reveal the faint CMB that lies behind it. It’s amazing work.

To accompany the announcement, Planck have released a Chromoscope-based version of their full data set here. This site shows all 9 bands (plus a composite image and the visible sky for reference) and lets you slide between them, exploring the different structures found at different wavelengths.

image

image

You can rearrange the different bands and turn on useful markers like constellations and known microwave sky features. It’s just great!

Also! There is also an option to view the data in terms of the content – or components – of the Universe. You can see that version here. You can switch between these views using the options box on the left hand side.

In this version of the site you’re able to see the different structures that contribute to the overall Planck sky image. This is how you can really start to understand what Planck is seeing and how we need to ‘extract’ the foreground emission from the data. In this view you can look at the dust, the emission purely from Carbon Monoxide (a common molecule at these wavelengths), the CMB itself and the low-frequency emission from elsewhere (such as astronomical radio sources).

image

image

Cardiff’s Chris North has put this site together (you can find him on Twitter @chrisenorth) and it was Chris, along with Stuart Lowe and I that first put Chromoscope together many moons ago now. I can’t take much credit for Chromoscope really but it’s fantastic to see it put to use here.

This is the wonderful blend of open science and public engagement that I love, and that astronomy is getting better at in general. What Planck are doing here is making the data freely available in a form that is digestible to the enthusiastic non-specialist.

This sort of ‘outreach’ is enabled by the modern web’s ability to make beautiful websites relatively painless to build and cheap to host. It’s also possible because we have people, like Chris North, who know about both the science and the web. Being comfortable on the Internet and ‘getting’ the web are so important today for anyone that wants to engage people with data and science.

So, go explore! You can zoom right in on the data and even do so in 9 frequencies. There is a lot to come from Planck – as scientists get to work pumping out papers using these data – so this site will be a handy reference in the future. It’s also awesome: did I mention that?

[URL: http://astrog80.astro.cf.ac.uk/Planck/Chromoscope/]

The Bones of the Milky Way