An Open Access partisan's view of "Electronic Publication and the Narrowing of Science and Scholarship"

There's been a good deal of online chatter about this recent Science article that discusses the effects of online access on scholarship -- see, e.g., discussions here and here and blog entries noted therein.  The report is not available without paying a toll or subscription, but the abstract is freely visible:

Online journals promise to serve more information to more dispersed audiences and are more efficiently searched and recalled. But because they are used differently than print -- scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse -- electronically available journals may portend an ironic change for science. Using a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005), I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. The forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. Searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.
This seems thoroughly counter-intuitive to me, since I find a good deal more information by direct search now that I can do it online, and browsing has never played a significant role in my literature searching.  (And remember, I'm old -- I started out using Index Medicus!)  Who has time to browse probably-irrelevant journals and tables of contents on the offchance that something might be useful?  I'm far more likely to stumble across things I'd never have otherwise found when I'm relying on a variety of relevance-based search algorithms (PubMed's Related Articles, Google Scholar, NextBio, etc.).

For anyone who thinks that "forced browsing of print archives" makes a lick of sense: we'll pick a topic, then you spend a day or two browsing in meatspace, and I'll spend an hour searching online.  Who do you think is likely to come up with the best (most useful, most comprehensive) set of references?

Moreover, the article's conclusions seem to be based on a couple of unspoken assumptions with which I don't agree.

The first is that citing more and older references is somehow better -- that bit about "anchor[ing] findings deeply intro past and present scholarship".  I don't buy it.  Anyone who wants to read deeply into the past of a field can follow the citation trail back from more recent references, and there's no point cluttering up every paper with every single reference back to Aristotle.  As you go further back there are more errors, mistaken models, lack of information, technical difficulties overcome in later work, and so on -- and that's how it's supposed to work.  I'm not saying that it's not worth reading way back in the archives, or that you don't sometimes find overlooked ideas or observations there, but I am saying that it's not something you want to spend most of your time doing.

Secondly, let's take the author at his word:

I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles.
OK, suppose you do show that -- it's only a bad thing if you assume that the authors who are citing fewer and more recent articles are somehow ignorant of the earlier work.  They're not: as I said, later work builds on earlier.  Evans makes no attempt to demonstrate that there is a break in the citation trail -- that these authors who are citing fewer and more recent articles are in any way missing something relevant.  Rather, I'd say they're simply citing what they need to get their point across, and leaving readers who want to cast a wider net to do that for themselves (which, of course, they can do much more rapidly and thoroughly now that they can do it online).

If that means citing fewer articles now than researchers tended to cite 20 years ago, it probably has more to do with changes in the culture of science than in the electronic availability of research papers.  For instance, I think it far more likely -- to exaggerate, for the purposes of illustration, in the opposite direction to Evans -- that earlier authors, unable to rapidly and comprehensively scan the literature, cited everything they could get their hands on, padding their bibliographies well beyond anything useful in an attempt to lend weight to their arguments.

It's potentially worrisome if more citations are going to fewer journals, but once again I see no more reason to attribute that to increasing online availability than to attribute it to the sharply rising cost of scientific journals in any form.  It's well documented that as journal prices have continued to rise, researchers and institutions have had to cut back on the number of subscriptions they take.  It is not difficult to imagine that "long tail" and "preferential attachment" phenomena (see, for instance, Evans' own references 14 - 18, reproduced below) would drive the concentration of likely subscriptions towards a pool of "must have" journals.  Indeed, publishers actively promote the concept of such a pool and compete strongly to be seen to be part of it.

Finally, and to me most importantly, Evans seems to me to gloss over the question of what proportion of the online archives are freely available, and what effect that has on the phenomenon he is attempting to model.  Here's the crux of what he does say (fair use! fair use!):

Evansfig2C.JPG

I've rearranged the figure so that what were left, middle and right panels are now top, center and bottom panels; in all graphs the abscissae are "Years of journal issues online" and the ordinates are "Herfindahl citation concentration", which is explained as follows:

A concentration of 1 indicates that every citation to [a given] journal [or subfield] in a given year is to a single article; a concentration just less than 1 suggests a high proportion of citations pointing to just a few articles; and a concentration approaching zero implies that citations reach out evenly to a large number of articles.
Here's Evans' interpretation of that data:
Figure 2C illustrates the concurrent influence of commercial and free online provision on the concentration of citations to particular articles and journals. The left panel shows that the number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. Free electronic availability had a slight negative effect on the concentration of articles cited within journals, but it had a marginally positive effect on the concentration of articles cited within subfields (middle panel) and appeared to substantially drive up the concentration of citations to central journals within subfields (right panel). Commercial provision had a consistent positive effect on citation concentration in both articles and journals. The collective similarity between commercial and free access for all models discussed suggests that online access -- whatever its source -- reshapes knowledge discovery and use in the same way.
Wait, what?  Let me unpack that with a rewrite from my point of view:
The number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal, whereas free electronic availability had a negative effect on the concentration of articles cited within journals. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. In contrast, if an additional 10 years of journal issues were to go online via any free source, the model predicts that its citation concentration would drop from 0.088 to just under 0.08 [I had to estimate this by eye, since the data are not available], a decrease of around 10%. Similarly, free electronic availability had only a marginally positive effect on the concentration of articles cited within subfields. Only when considering concentration to journals within a subfield did free availability cause a substantial increase, and even then this effect was considerably less than that driven by commercial availability, which had a consistent positive effect on citation concentration in both articles and journals.
In other words, I take issue with the final sentence of the paragraph I quoted: commercial and free access do not show "collective similarity".  On one of three measures they have the opposite effect, and on the other two measures commercial access has by far the stronger effect.

What this suggests to me is that the driving force in Evans' suggested "narrow[ing of] the range of findings and ideas built upon" is not online access per se but in fact commercial access, with its attendant question of who can afford to read what.  Evans' own data indicate that if the online access in question is free of charge, the apparent narrowing effect is significantly reduced or even reversed.  Moreover, the commercially available corpus is and has always been much larger than the freely available body of knowledge (for instance, DOAJ currently lists around 3500 journals, approximately 10-15% of the total number of scholarly journals).  This indicates that if all of the online access that went into Evans' model had been free all along, the anti-narrowing effect of Open Access would be considerably amplified.

In fact, the comparison between print and online access is barely even possible when considering Open Access information.  The same considerations of cost -- who can afford to read what -- apply to commercial print and online publications, but free online information has essentially no print ancestor or equivalent.  Few if any scholarly journals were ever free in print, so there's a huge difference between conversion from commercial print to commercial online on the one hand, and from commercial print to Open Access on the other.

Indeed, I would suggest that if the entire body of scholarly literature were Openly available, so that every researcher could read everything they could find and programmers were free to build search algorithms over a comprehensive database to help the researchers do that finding, then in fact the opposite effect would obtain.  Perhaps it's true that the more commercial online access you have, the less widely a researcher's literature search net is cast, but as I mentioned above I see no reason to attribute that more to the mode of access than to its cost.

In support of this assertion, consider the expanding body of literature on the Open Access "citation advantage" -- studies which show that the likelihood of a given paper being cited is increased up to several hundred percent if the paper is OA rather than commercially available.  There is some controversy over that literature, but it stands in direct contrast to the idea that online access of any kind tends to narrow citation reach.

There are more data in Evans' paper that speak to the free-vs-commercial issue, and some of those data show free access having a stronger "narrowing" effect than commercial access.  I'd go through it in detail, but I am probably already pushing the limits of fair use so I'll have to refer you to the published article -- in particular, Figure 2 panels A and B.  My response is much the same, that the apparent effect suffers from a loading in "favour" of commercial access, because of the wildly disparate sizes of the two different bodies of online literature. 



-----
refs 14-18 from Evans, JA Science 321:395, 2008:

A. L. Barabási, R. Albert, Science 286, 509 (1999).
R. K. Merton, Science 159, 56 (1968).
D. J. de Solla Price, Science 149, 510 (1965).
H. A. Simon, Biometrika 42, 425 (1955).
M. J. Salganik, P. S. Dodds, D. J. Watts, Science 311, 854 (2006).

Updates 080720:

1. I linked to the FriendFeed discussions but meant to emphasize -- in one of those conversations, Lars Juhl Jensen points out that the single biggest change is information volume:

I cannot help but wonder if this has anything to do with electronic publication, or if it is simply an effect of sheer volume. If researchers have to search through ten times as many articles (because of the exponential growth of the literature), is it really surprising that they don't make it as far back into the past as they used to do?
This is related to, though stronger than, my point about changes in the culture of research.

2. Bora reminded me of another conflicting study by Arthur Eger, this one showing that "a larger [online] content offering coincides with a dramatic increase in Full Text Article requests, and an increase in Full Text Article requests, after about 2 years, coincides with increased article publication". This is not necessarily inconsistent with Evans' claims, especially since the Eger study also showed that the effect of increasing backfile availability is "modest", but I would like to see those increased Full Text requests broken down by date of publication...

3. Tom Wilson doesn't necessarily agree with my (rather blithe?) assertion that researchers are indeed aware of preceding work:

would it were true that authors are not ignorant of earlier work. In my experience as an Editor and a PhD supervisor, I am continually amazed at the extent to which authors and students are unaware of pre-WWW work. It seems that if the work was done before 1995 it is assumed to have no relevance to the present day. In many cases, of course, that will be true and in some cases the research record is a record of building upon earlier work. In the case of many subfields in information science, however, it isn't the case. A great deal of work was done in the 1970s, which is now completely ignored. Researchers rediscover wheels again and again, when a search of the earlier literature would have revealed that what they think of as novel, was novel 50 years ago!
I think this points up my own biases, in that when I think of research I tend only to think of wet lab science, molecular biology in particular since that's what I do for a living. There are many other fields of research! It strikes me that if molecular biologists do in fact reinvent wheels less often than other disciplines, it is perhaps because our online records go back a long way: PubMed reaches back to 1966, and has some coverage all the way back to 1951. Since molecular biology can fairly be said to have come of age as a discipline in 1953, this suggests two things: that Evans may be more right than I think for disciplines outside my own, and that if those disciplines could digitize their archives efficiently it might go a long way towards solving the problem. In other words, the answer to the narrowing effect of online access on scholarship may be to broaden and deepen online access.


Comments

See also:

Are Online and Free Online Access Broadening or Narrowing Research?
http://openaccess.eprints.org/index.php?/archives/443-guid.html

Excerpts: Before OA, researchers cited what they could afford to access, and that was not necessarily all the best work, so they could not be optimally selective for quality, importance and relevance...

When everything becomes accessible, researchers can be more selective and can cite only what is most relevant, important and of high quality. (It has been true all along that about 80-90% of citations go to the top 10-20% of articles. Now that the top 10-20% (along with everything else in astrophysics), is accessible to everyone, everyone can cite it, and cull out the less relevant or important 80-90%...

Are online and free online access broadening or narrowing research? They are broadening it by making all of it accessible to all researchers, focusing it on the best rather than merely the accessible, and accelerating it.

Comment number: 017494   Posted by: Stevan Harnad on August 4, 2008 06:38 PM from IP: 76.66.193.132

Post a comment

















RSS Feed

Links:
spousal unit
me
copyright anything
Bloglines account
Simpy account
Connotea account
OpenWetWare userpage
monthly irregular column on 3QuarksDaily


Please sign the petition in support of the European Commission's proposed Open Access Self-Archiving Mandate

Please also sign the SPARC/ATA Petition for Public Access to Publicly Funded Research in the United States


blogroll:



Archives:
November 2008
October 2008
September 2008
August 2008
July 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003









Design thrown together haphazardly by frykitty.
Powered by the inimitable MovableType.