If you’ve been following the recent series of posts about my data mining, then a) I apoligise and b) it just got better!
The short story is that research in astrophysics is generally made available online and is entirely available, in digital form, all the way back to the begining of the refereed jounrals on the topic in 1827. I have downloaded a lot of the data and have been mining it for my own interest (mostly on the bus).
This week I expanded the database, so that it now inclidues the five main journals for astronomy: MNRAS, ApJ, A&A, AJ and PASP. If you think I’m missing something important, please tweet me. I also decided it was time to grab the data regarding authorship, meaning I now have the list of authors, and their affiliations, for each paper since 1827.
Incidentally the 1827 papers, from MNRAS include Charles Babbage amongst the authors, discussing log tables. My favourite paper title from that first year is titled Observations of eclipses of Jupiter’s satellites by Col. Beaufoy’s telescope. There are a lot of Colonels, Majors and other ranks listed in those days.
Another notable fact from that time is that just about every paper was written by one person. People worked alone, and the society was a chance to gather and share findings. This is in stark contrast to today when astronomers generally work in groups and publish as such, together.
Authorship in Astronomy
The above plot shows the average number of authors, per paper since 1827. You can see the trend is not subtle. Around 1960 the value begins climing very quickly, and accelerates. Here’s the same plot on a log scale and showing the maximum number of authors on any paper from that year - another indicator of group sizes in general.
The size of astronomical collaborations are growing fast. In 2011 a group of 770 people co-authored a paper Search for Gravitational Wave Bursts from Six Magnetars, in ApJ. The same collaboration published the 668 -author Searches for Gravitational Waves from Known Pulsars with Science Run 5 LIGO Data a year earlier. One has to question the concept of ‘authorship’ when conisdered in this way, and also the value of citations for these authorships.
In case you were wondering, the large group of co-authors in 1857 is due to an occultation of Jupiter by the Moon that year. The event was observed from all over the UK, and coordinated by the Astronomer Royal into one large paper.
A better way to underatand the changing way we publish, might be the plot above. Here we see the percentage of total papers written by 1, 2, 3, 4 or more authors. You can see that single-author papers dominated for most of the 20th Century. Around 1960 we see the decline begin, as 2- and 3-author papers begin to become a significant chunk of the whole. In 1978ق-author papers become more prevalent than single-author papers.
In the 1990s single-authorships continue to decline, and multiple-atuhorships in general are in the ascendency. The distribution flattens out, and by 2012 2-, 3- and 4-author papers each make up about 15% of the literature (single-authorships are down to 6%), and the largest contribution now comes from papers with 5–9 authors. Groups of 10 or more are clearly on the rise too.
If we plot the same chart but in terms of citations, rather than just publications, we get the above. The trends are much the same, but the overall influence of single-author papers declines harder, and slightly faster, after the 1960s. Notably, papers with 5 or more authors appear to be cited more often, relative to their publicatrion rate. Perhaps reflecting the fact that big surveys, and cutting-edge instrumentation requires putting a lot of heads together and that such efforts are beneficial to the community.
If we take all the names on the papers, de-duplicate, and count them up we get a crude measure of the population of working research astronomers. It’s crude because it doesn’t take into account the fact that multiple people can have the same name, and nor does it notice changes in spelling or initials. So at present the code doesn’t know that Simpson R.J. and Simpson R. might be the same person. I am also not using affiliation information at this time, because the purpose here is just to get a feel for the trends. It would take a lot longer to collect everyone up and cluster all their various names together.
So the population of the research community also changes around 1960 - which is no surprise really as this is when publishing in general begins to boom (see my first post on all of this) and when MNRAS, ApJ and A&A all begin the trend of publishing more year-on-year. So let’s compare this to the number of papers to make it more meaningful.
Here we see that people begin to outpace papers in the 1960s, meaning what exactly? Well I suppose it must be related to the first plot, in that we’re publishing in larger groups. It may reflect the fact that as we get more technical as a field, and more specialised, it takes more people to write the same number of papers? This seems like a reasonable idea.
Here we see the ratio of people to papers in terms of papers, per member of the research population. This is similar to the first plot, but accounts for people publishing on more than one paper.
With more papers being published, and more people taking part, I had always assumed that people published more work collectively, and that the communications network allowed expertise to be deployed where it was useful. However, it seems that we need more people to acheive the same amount of work that we did in the 1950s. This doesn’t feel right, and I’ve replotted it a few times, and seems to be the case.