April 2009 ArchiveWednesday, 29 April
Did Facebook screw up, or did I?
I am usually pretty careful about permissions for Facebook apps, and I did not notice any opt-out for the spamming of my friends. I honestly think this thing has its privacy settings wrong (deliberately?) and does not give the usual options, just goes ahead and spams your friend list. Sorry to anyone who got a "notification" from me -- either I got sloppy or, as I suspect, this thing is viral. Tuesday, 28 April
Perpetuating an OA myth
Maxine at Nautilus posted a slightly shortened version of this letter to Nature from Raf Aerts; what caught my eye was the rearing of a familiar ugly head (emphasis mine): ...the [global recession] may also be affecting the publication output of research institutions in a more subtle way. It could be boosting the traditional reader-pays publication model for scientific journals at the expense of the author-pays, or open-access, model. This myth, that OA is synonymous with author-pays, is a toll-access publisher's delight. It simply is not true. See here for detail; briefly:
Raf goes on to say: ...few peer-reviewed open-access journals have so far had a high impact factor in their field, except for a small number such as those published by the Public Library of Science and BioMed Central. They are therefore struggling to emerge and to attract the most prestigious research findings. I don't see why we should assume that anything will "deteriorate" if OA journals switch to new funding models, or that OA journals will have a harder time 'emerging' if they move to a model that is actually closer to the old, familiar toll-access model. After all, there already exist a wide variety of ways in which OA publications pay the bills: advertising, endowments, philanthropy, institutional subsidies, memberships, priced editions and more. In particular, hybrid journals (which is what JoVE has become) are popular with toll-access publishers as a way to establish a foothold in OA territory. Inter alia, Elsevier, Springer and Wiley all publish hybrid journals, and between them, those three account for more than 40% of the worldwide science/tech/medicine publishing market -- so the hybrid model is pretty well established. There's more to say about authors' willingness and/or ability to pay, too. Firstly, it's almost never the author who pays, but the funding body paying for that author's research. At the moment, this can translate into using up precious grant money when there's a need to pay author-side fees, but with 77 funder, institutional and departmental OA mandates in place and more on the way, it seems reasonable to suppose that more and more of the mandating bodies will underwrite more and more of the costs of publishing. For example, HHMI has institutional agreements/memberships with BMC, Springer and Elsevier, and BMC's page of funder policies shows that a majority of UK funders either make additional funds available or allow publication charges to be treated as an indirect cost. Many OA journals also waive or reduce their fees on application; for instance, here are the PLoS (scroll down) and BMC policies. Finally, remember that the Kaufman-Wills study showed that 75% of the toll-access journals surveyed charged author-side fees (page charges, colour charges, reprint charges, etc) in addition to their subscription charges. So when there are author-side fees involved, I'd like to know how those charged by OA journals (in return for which the work is freely available to everyone, forever) compare with those charged by toll-access journals (in return for which, authors often cannot retrieve their own work, and anyone who wants to read it must pay another fee). Sunday, 26 April
Caste in America (or: hell in a handbasket, yes indeedy)
I don't spend much time writing about politics any more -- my mental health just can't take it. But, data! Via 3QuarksDaily: the Office of Management and Budget has a blog, to which director Peter Orszag posted an entry on "The Case for Reform in Education and Health Care". He describes a talk he gave to the Association of American Universities, and makes his slides available as a pdf. From those slides: Whether you even start college depends as much on your family's income as on your ability (insofar as math scores are a decent proxy for such ability). For instance, if you're an average student (middle third math scores) you are about twice as likely to go to college if your family earned in the highest bracket, relative to your chances if your family earned in the lowest bracket. Similarly, if you're in one of the two lowest income brackets, you can roughly double your chances of going to college by getting your math score up from middle to highest third. ![]() If you do start college, whether or not you graduate also has a lot to do with family income: almost half of the students from the lowest income background do not finish college, whereas the noncompletion rate drops to less than 25% in the highest family income bracket. ![]() There is a vicious circle in operation: relative to a high school education, a college education returns a premium of over 400%, making you that much more likely to contribute to your children's success, as shown above. (The ordinate shows the log of the ratio between the return to a college education versus the return to a high school education: 10^0.6 is about 4.) ![]() The vicious circle encompasses more than just school. If you have money, you're more likely to be insured and to have more formal education; both factors make you much more likely to take part in routine health screens, which in turn makes you more likely to stay healthy, which in turn keeps your earning potential up, and so on. ![]() In a similar vein, Ryan Avent adds this figure from the Pew Charitable Trusts' Economic Mobility Project, which shows that you're more likely to wind up in the top earning quintile if your parents were in that demographic but you didn't go to college, than if you did go to college but your parents were in the bottom quintile (click the image at right for a popup or go here): The rich get richer, the poor get the picture, but Garrett was wrong about one thing: when you're down so low, that's right where the bombs are most likely to land. Here's a little Vonnegut to take us to the news at the top of the hour: America is the wealthiest nation on Earth, but its people are mainly poor, and poor Americans are urged to hate themselves. To quote the American humorist Kin Hubbard, "It ain't no disgrace to be poor, but it might as well be." It is in fact a crime for an American to be poor, even though America is a nation of poor. Every other nation has folk traditions of men who were poor but extremely wise and virtuous, and therefore more estimable than anyone with power and gold. No such tales are told by the American poor. They mock themselves and glorify their betters. The meanest eating or drinking establishment, owned by a man who is himself poor, is very likely to have a sign on its wall asking this cruel question: "If you're so smart, why ain't you rich?" [...]
word cloud CVs for dummies
Pierre and Pawel both did amazing things with word clouds for their CVs, using all kinds of Here's what Wordle made of the resulting list: ![]() It's not horrible, though I can already see things I forgot to put in, ![]() (Wordle settings for both versions: language: remove numbers, leave as spelled, remove common English words; font: Telephoto, layout: straighter edges, horizontal; color: Wordly, a little variance.) Tuesday, 21 April
out of season
I've been meaning to post more verse -- my own, and other people's. Blame Jason for reminding me of this one, even though it's the wrong end of the season (and I'm nowhere near a campus these days):
Autumn Song for Alec (How you doin' Alec? I hope you're OK.) Saturday, 18 April
That bloody video.
First of all, that's five minutes you'll never get back. Five minutes isn't much, but when you only have 30 or 60 minutes a day to spend online -- as, e.g., I did in my last job -- you resent every stolen second. This is why I hate, with a fierce and curmudgeonly hate, multimedia without transcripts or text versions. Secondly, here's the content -- in a form you can use at your own pace without needing pause and fast forward buttons:
When you see it like that, not zooming out at you with a soundtrack and a bunch of twee effects, it becomes obvious that there's nothing much there, and what there is, is rather disjointed and incoherent. Many of the factoids look shaky to me, and there are only a couple of references or sources provided (why not provide the others?). I'm not going to bother with a fisking, but here are some obvious eyebrow-raisers:
Update, written after all of the above: It's important to note that although the version discussed above is the only one I'd ever seen before today, it is actually the third version on YouTube and was "remixed" by Sony BMG in August 2008. The original was made by Karl Fisch in August 2006; Scott McLeod's version dates to January 2007 (this was the first one to make it to YouTube and was responsible for the first viral wave); Jeff Brenman created a SlideShare version a couple of months later, and the official version 2.0 was made in consultation with XPLANE in June 2007. In fairness to Fisch (sounds like a PETA chant), many of the shortcomings of the version that so annoyed me must be laid at the feet of the anonymous Sony drone responsible for the "remix". Not only did Fisch provide a text version and a list of his sources with version 1.0, but version 2.0 does a better job than the Sony version of acknowledging the sources in the course of the presentation and even comes with its own wiki, mentioned in the presentation. Version 2.0 is also considerably more coherent and much nicer to look at, and does a (somewhat) better job of avoiding the "eek, brown people!" tone. (Fisch says in a couple of places that he and McLeod, in response to criticism, consciously worked to reduce that "us vs them" feeling, and points out here that he views it as largely an unforseen side-effect of some of the changes between his original powerpoint version, made for his immediate colleagues, and the first YouTube version.) Finally, kudos for choosing a Creative Commons license (even though I don't like copyleft): although the Sony version leaves this out, all versions are CC-BY-NC-SA (source files are available on the wiki). In my opinion it's a damn shame that the Sony version took off (at the time of writing, there are two copies on YouTube with 4,458,229 and 29,828 views, respectively). If you come across someone talking about that version, do everyone a favour and point them to version 2.0.
Scholarly (scientific) journals vs total serials: % price increase 1990-2009
Following on from this post, I manually extracted historical data for average scholarly journal prices in a dozen broad disciplines from the Library Journal Annual Periodicals Price Surveys by Lee Van Orsdel and Kathleen Born, and compared these with three datasets from the earlier post: ARL libraries' median total serials expenditures (ARL all serials), Abridged Index Medicus average journal price (AIM) and the consumer price index (CPI): ![]() My concern with the AIM dataset was that it was too small and specialized to support broad conclusions, but it turns out that the AIM data sit somewhere in the middle of the disciplines analysed. Astronomy is closest to the ARL all serials median, with math and computer science not much worse; general science is the worst offender, with engineering and technology, chemistry and food science not far behind. From 1990 to 2008, total price increases ranged from 238% (astronomy) to 537% (general science); that's 3.7 and 8.3 times the increase in the CPI, respectively. This dataset covers an average of around 3600 journals from 2005-2009, 3255 from 1997-2001 and 2655 from 1989-1990. I think this represents good evidence that historical price data for total serials, even though it shows a rate of increase far greater than that of the CPI, masks an even greater rate of increase among scholarly (scientific) journals. It's difficult to look at that graph and believe that scholarly publishers are playing fair, particularly when one remembers that online publishing, with its attendant cost reductions, came of age during the same period of time. The Van Orsdel/Born surveys include a number of other scholarly disciplines (art, architecture, business, history, language, law, music, etc etc). If I have the time I'll work those up as well, to provide as broad a picture as possible. I should also include numbers of titles in each discipline, to give some idea of total influence. For instance: although general science (around 60 or 70 titles) shows the greatest increase, it likely contributes far less to the serials crisis than health sciences (more than 1500 titles). (The data are available in this Excel spreadsheet.) Friday, 17 April
Some wishes come true.
A while back, I posted about my discovery (new to me, though not new to many others) that the serials crisis should probably be called something like the "scholarly journals crisis". The term "serials" includes a wide range of publications, most of which are not peer-reviewed scholarly journals -- newspapers, goverment reports issued in series, yearbooks, magazines and more. Only about 1/10 of the serials in Ulrich's directory are peer-reviewed. The average scholarly journal costs around 10 times as much as the average serial, and while the cost of the scholarly literature continues to climb, median serial unit costs at ARL libraries have actually been falling for the last seven or eight years (Fig 1 below). It therefore appears that scholarly journals are the driving force behind the serials crisis. At the time, I wished that I had some specific data to show the difference between scholarly and average serials -- hence the title of this post: via medinfo, I learned that EBSCO Information Services has released a brief report (pdf!) on the price history of well regarded clinical journals, using 117 titles from the NLM's Abridged Index Medicus (AIM). This is a curated list of biomed journals "of immediate interest to the practicing physician" and can be searched on PubMed as a subset limit named "core clinical journals". As a reminder, here's that graph; it's from the ARL stats report from 2004-5 and the reason it's famous is the way that "Serials Expenditures" outstrips the Consumer Price Index (CPI) and other measures: ![]() Here's a comparison of that data with the price history of the AIM journals; the line labeled "expser/ARL libraries all serials" shows the 1990-2005 subset of the "Serials Expenditures" data from Fig 1, and "EBSCO/core clinical journals" shows the AIM data: ![]() Data labels (ARL data from here):
This is exactly what I wished for, hard evidence of the difference between scholarly and average serials; and what that evidence strongly indicates is that price increases in scholarly journals are driving the serials crisis. Scholarly journals far outstrip total serials in terms of annual price increase, even though the latter shows a much more rapid increase than the CPI. In contrast, library salary expenditure follows the CPI closely, and median serial unit cost (all serials) has been dropping slowly since 2000. Frankly, I'm tempted to name this the Big Fat Ripoff Graph. Between 1990 and 2008, the CPI increased by about 65%, whereas over the same period the average price of an AIM journal increased by 415%, a 6.4-fold difference. I've seen publishers try to defend the "total serials expenditures" vs CPI discrepancy by pointing out that journals are proliferating -- indeed, the "serials purchased" curve is headed upwards at an increasing rate, particularly over the last five years or so. But that defense is no good against the BFR Graph, on which the most damning curve shows average journal prices. I've also seen comments to the effect that if mean or median serial unit costs are dropping, publishers must be offering increasing value for money even if they are charging more in total. That might be true of the set of "all serials publishers", but it's apparent from the BFR Graph that scholarly journal publishers can make no such claim. It must be remembered, of course, that we are only looking at a little over a hundred clinical journals here, a small and discipline specific subset. Nonetheless, the result is so striking that I think it is a considerable inducement to the gathering of more data. Since it seems my wishes for more work are coming true, I'll make another: now I want price history data for other, larger journal subsets in other scholarly disciplines. I wonder what the BFR Graph looks like for those datasets? (P.S. If you want the numbers I used, or to check my work, the spreadsheet is here.) Wednesday, 15 April
What's wrong with copyleft?
This FriendFeed thread regarding the Wikipedia licensing vote has stirred up an old hornet's nest of issues surrounding copyleft and noncommercial clauses in Open licenses. As I said in the thread, I get most of my ideas on this topic from David Wiley, and have posted about those ideas before. Herewith another attempt to organize and clarify my thoughts, as much for my own benefit as anything:
2a. Copyleft prevents future copyright lockup by requiring that all downstream (reworked or remixed) works be similarly licensed. 3. Although copyleft and NC clauses achieve their own immediate goals, widespread license incompatibility1 means that they often (perhaps usually) defeat part of the larger purpose of Open licensing. The use case where this is most prominent is remix2, since reuse and redistribution of individual copylefted or NC-licensed works or their derivatives is usually just a matter of retaining the original license. But multiple works can only be recombined into new works if their respective licenses are compatible -- otherwise, there's no licensing option for the remix that doesn't violate the licensing terms of at least one of the ingredients. Not only that, but if any of the works in the mix carries a copyleft license, that license takes over the entire remix and everything downstream of it, thus propagating the incompatibility problem.
You may distribute, publicly display, publicly perform, or publicly digitally perform a Derivative Work only under: (i) the terms of this License; (ii) a later version of this License with the same License Elements as this License; (iii) either the Creative Commons (Unported) license or a Creative Commons jurisdiction license (either this or a later license version) that contains the same License Elements as this License (e.g. Attribution-ShareAlike 3.0 (Unported)); (iv) a Creative Commons Compatible License.where (iv) is defined as a license that is listed at http://creativecommons.org/compatiblelicensesSadly, the cupboard remains bare so far: Please note that to date, Creative Commons has not approved any licenses for compatibility; however, we are hopeful that we may be able to do so in the future. If you would like to discuss the possible compatibility of your license with a Creative Commons license, please email us at info@creativecommons.org. I am personally persuaded that the Public Domain is the best way out of the copyleft trap, which is why I use CCZero for everything I make. ![]() Restrictive (NC, SA) versions currently account for around 80% of worldwide CC licence uptake. Once you start factoring in the dozens and dozens of other Open/Free licenses out there, it only gets worse. The FSF and OSI maintain lists of licenses and compatibilities (here and here, respectively), and wikipedia includes a couple of fairly extensive comparison tables. Speaking of Wikipedia, the world's favourite online encyclopaedia is currently released under the GNU Free Documentation License, which is not compatible with any CC license except Public Domain though it does allow transition to CC-BY-SA. If the current vote on that transition is "yes", that will be a step forward -- but it will still leave Wikipedia with the compatibility problems shown in the figure above. Exploration of compatibility issues with all the other Free/Open licenses is left as an exercise, etc. * from here and here; green indicates compatibility, light green indicates possible compatibility -- some disagreement between sources. Monday, 13 April
Anniversary of sorts
This question from Antony Williams on FriendFeed: Is PubChem Data Open or not? There are many discussions saying that PubChem data are Open but I see PubChem as a host and the disclaimer does not say "open": http://tinyurl.com/e78as reminded me that it's almost a year to the day since Egon Willighagen asked a similar question about PubMed Central content: I was wondering about this section in the CC license of much of the PMC content, such as our paper on userscripts (section 4a of the CC-BY 2.0): These are both very good questions, and I still don't have an answer for Egon's even after a year. I'm reluctant to go pestering John Wilbanks with every CC-related question I come across, so I'm reposting in the hope that someone will be able to save John from me.
Lazy reporter, no donut.
Dennis Carter in an eCampus News article about NPG's Scitable: Scitable's January launch came as elite universities across the United States are embracing open-access formats--making research articles available for free online. This marks an abrupt departure from the traditional model of printing research articles in academic journals, which can cost campuses as much as $20,000 annually, open-access experts say.So, is it the traditional model that can cost campuses up to $20K/yr, or academic journals, each of which can cost etc? It's only obvious that what is meant is $20K/yr per journal subscription if you already know that libraries spend millions of dollars per year on serials. I'd expect a publication that wants you to register to read its content1 to bother making that content accurate and unambiguous.
Someone else is fooling around with numbers.
Via Peter Suber, I came across this editorial in the Journal of Vision: Measuring the impact of scientific articles is of interest to authors and readers, as well as to tenure and promotion committees, grant proposal review committees, and officials involved in the funding of science. The number of citations by other articles is at present the gold standard for evaluation of the impact of an individual scientific article. Online journals offer another measure of impact: the number of unique downloads of an article (by unique downloads we mean the first download of the PDF of an article by a particular individual). Since May 2007, Journal of Vision has published download counts for each individual article.The author goes on to compare download vs citation (counts and rates, and downloads or citations over time). It's a pretty good analysis of an important topic, but something vital is missing: Where are the data? Can I have them? What can I do with them?1In fact, the data are approximately available here. Why "approximately"? Well, I can get a range of predigested overviews: DemandFactor (roughly, downloads/day/first 1000 days) Top 20, total downloads Top 20 and article distributions by DemandFactor and total downloads. I can also get the download information for any given article -- one article at a time, and once again predigested in the form of a graph from which I have to guesstrapolate if I want raw, re-useable data. This is disappointing, for both general and specific reasons. It's always disappointing to see data locked away in a graph or a pdf or some similar digital or paper oubliette, there to languish un(re)used. It's also disappointing to see a journal getting way out ahead of the curve on something as important and valuable as download metrics (is there another journal besides J Vis that provides this information, even predigested?), and then missing an opportunity to continue to innovate by providing real Open Data. It's also disappointing in this specific instance, because I have a question: why is Figure 1 plotted on a log scale and, more importantly, was the correlation coefficient calculated from log-transformed data? I could understand showing the log scale for aesthetic reasons, but I can't think of a reason to take logs of that kind of data -- and doing so can alter the apparent correlation. For instance, remember Fig 1 from this post? Here it is again, together with a plot of log-transformed data, both shown on natural and log scales: I could answer my own question quickly and easily if I could get my hands on the underlying data -- which leads me right back to one of the primary general arguments for Open Data. If I, statistical ignoramus and newcomer to these sorts of analyses, have questions after a brief skim through the paper, what questions might a better equipped and more thorough reader have? It's simply not possible to know -- the only way to find out is to make the data openly available! I realise it's not possible for journals to demand Open Data from their authors -- that's what funder-level mandates are for, though there's much discussion still to be had regarding whether Open Data mandates would be a good idea. Nonetheless, when journals publish analyses of their own data, it would be great to see them leading the way by providing unrestricted access to that data. ------------- Saturday, 04 April
Why don't we share data? Not for the reasons Steven Wiley thinks we don't.
Via Peter Suber, I came across an editorial about data sharing in The Scientist. I disagree with the author, PNNL's Steven Wiley, on a number of points: Despite the appeal of making all biological data accessible, there are enormous hurdles that currently make it impractical. For one, sharing all data requires that we agree on a set of standards. This is perhaps reasonable for large-scale automated technologies, such as microarrays, but the logistics of converting every western blot, ELISA, and protein assay into a structured and accessible data format would be a nightmare -- and probably not worth the effort. Wiley is making two mistakes here: setting the perfect against the good, and vastly underestimating human ingenuity. Standards are inarguably required for automated sharing and essential for the sharing of ALL data, but that doesn't mean that sharing SOME data, with evolving standards or even without any standards, has no utility. My pet example is the long standing practice of supporting scientific claims with the phrase "data not shown" in peer-reviewed papers, something I think should no longer be allowed. All scientific claims should be supported by data. "Data not shown" belongs to the print era, when space was limited and distribution relied on physical reproduction and transport. This is the era of the online supplement, to which no such restrictions apply. Reasonable people might contend that I am stretching the concept of "data sharing" to cover my pet peeve there, but I chose the example deliberately as an edge case: there is, to me, clear utility in that kind of data sharing, even though it involves no standards, only some data, and only eyeball-by-eyeball access (whereas I myself frequently argue that the greater part of the value of Open distribution probably lies in the long term, in machine-to-machine access). I argue that more sharing, using -- despite their current flaws -- evolving standards, is likely to yield significant dividends well before reaching the eventual goal of sharing all data using universal standards. This leads me to the second mistake. It seems odd to me to insist that because standards are difficult to develop and implement, the bulk of such work is futile. The key is the phrase "currently... impractical". The whole concept of the internet was probably considered "currently impractical" by a great many people, until someone went and built it. There are plenty of people still willing to pronounce Free/Open Source software "currently impractical", even as they (perhaps unwittingly) rely on it every time they go online or send email. Then-existing hurdles at various times surely made business on the internet "currently impractical", and banking on the internet "currently impractical", and -- need I go on? Moreover, I am not the only one who disagrees about the value of creating standards for difficult-to-share data. If you think western blots would be a nightmare, how about biodiversity data -- like, say, museum specimens? How about anthropometric data, exchangeable biomaterials, neuroscience data, electron micrographs, magnetic resonance images or microscopy images? The MIBBI project has dozens of other examples, the Open Biomedical Ontologies Foundry is working on dozens more, and Bioformats.org might offer a lightweight solution to some of the same problems. (In re: Wiley's specific examples: I was easily able to find efforts underway to enable sharing of gel electrophoresis data, protein affinity reagents and molecular interaction experiments; and I can't imagine ELISA data being much harder to share than microarray information -- surely MIAME, for instance, could readily be adapted if it wouldn't already serve? I'm not sure what kind of protein assay Wiley has in mind.) I cannot begin to imagine how to build semantic and exchange standards for those kinds of data, but I'm not about to bet against the people currently trying to do so; nor do I believe that, once built, their systems will prove to have been "not worth the effort". As I mentioned, reasonable people might disagree about various points above. But Wiley goes on to say: Unfortunately, most experimental data is obtained ad hoc to answer specific questions and can rarely be used for other purposes. which is just plain wrong. Much of the rationale for data sharing, the engine of much of its promise, is the simple observation that you cannot know what someone else will do with your data, particularly when they have access to lots of other people's data to go with it. Re-use beyond the scope of the original author's imagination is a primary impetus for data sharing, and innovative examples abound; for instance, just take a look at Tony Hirst's blog. (If there is a dearth of examples from biomedical research, I'd call that an argument in favor of more, not less, data sharing.) "Can rarely be used" is an empirical claim, and those should be backed by data -- and I can think of only one way to get the relevant data in this case. Wiley continues: Good experimental design usually requires that we change only one variable at a time. There is some hope of controlling experimental conditions within our own labs so that the only significantly changing parameter will be our experimental perturbation. However, at another location, scientists might inadvertently do the same experiment under different conditions, making it difficult if not impossible to compare and integrate the results. Experimental results are supposed to provide useful information about the world of sense-perception. If a result cannot be repeated by different hands in a different lab, then it is probably not telling us what we think it is telling us about the way the world works. If, on the other hand, a particular result does mean what we think it means about the underlying system, then we should be able to design different experiments to be carried out with different hands, conditions, equipment etc., and obtain data that supports the same conclusions. That's what we call a robust result, and standard practice is to aim for robust results. Regarding integration and comparison of results from different conditions -- just what does meta-analysis mean, if not exactly that? As an example, if you were to knock Pin1 down in HeLa cells, you'd block their growth, but Pin1 knockout mice survive just fine. Comparison of those results is not only possible, but extremely interesting, and is the way we learned that mice have an active Pin1 isoform, Pin1L, which is present but potentially inactive in humans. I think that variation in conditions between labs is a good reason to build finer-grained semantic structures, but no reason at all to throw up our hands and give up on linked data. Wiley goes on to give as his sole concrete example the lack of uptake into published papers of data from the Alliance for Cell (sic) Signaling. It's actually the Alliance for Cellular Signaling1; their website lists 20 publications, NextBio finds 35 and Google Scholar (which covers a lot more than peer-reviewed papers) finds 440. Scholarly papers are a somewhat limited measure of research impact, but that's not at first glance an impressive showing. Consider, though, that the AfCS was established in the late 1990's, which puts it well ahead of its time, and then compare the first, second and ongoing third decades of the undisputed poster child of data sharing2: There's more to Wiley's choice of example, though: In my own case, I am interested in the EGF receptor and receptor tyrosine kinases. This aspect of cell signaling was not covered in their dataset, and thus it is of no interest to me. I wish I had a dollar for every time I'd heard an argument against some new idea that boils down to: "I can't figure this out, or find a use for it myself; therefore it's no good and will never be any use to anyone". I'm sure there's a pithy Latin name for this particular logical fallacy. Wiley continues in, as it turns out, a similar vein: And soon, discussions about the importance of sharing may become moot, since the rapid pace of technology development is likely to eliminate much of the perceived need for sharing primary experimental data. High throughput analytical technologies, such as proteomics and deep sequencing, can yield data of extremely high quality and can produce more data in a single run than was previously obtained from years of work. It will thus become more practical for research groups to generate their own integrated sets of data than try to stitch together disparate information from multiple sources. And just what does the PNNL's Biomolecular Systems Initiative (of which Wiley is director) do? By a strange coincidence, this: advancing our high-resolution, high-throughput technologies by exploiting PNNL's strengths in instrument development and automation and applying these technologies to solve large-scale biological problems.... The remainder of this post is left as an exercise for the reader; be sure to cover the question of how less well-heeled institutions are supposed to carry out work in proteomics and deep sequencing and so on, and don't forget to ask for evidence showing that it is not important to share data even between such high-fliers, since presumably they can extract every last conceivable piece of useful information from their own data...
2 graph from here Wednesday, 01 April
Fooling around with numbers, part 5b.
I've already assigned part 6 to a particular analysis in an effort to get me to actually do that work, but I felt that I just had to include this (via John Wilbanks) in the series: ![]() I'm just sayin'. (I may have to get that graph as a tattoo). P.S. Never mind the date, this is not a trick; I hate online April Fool jokes with the fiery power of a thousand burning suns. |
RSS Feed
Links: (formerly Malice Aforethought) me spousal unit Bloglines account Simpy account Connotea account OpenWetWare userpage googlebombs for good Roe; Wade; Roe v Wade abortion Jew Seldovia Herald blogroll: Archives: August 2010 June 2010 April 2010 March 2010 February 2010 January 2010 October 2009 July 2009 June 2009 May 2009 April 2009 March 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 February 2006 January 2006 December 2005 November 2005 October 2005 September 2005 August 2005 July 2005 June 2005 May 2005 April 2005 March 2005 February 2005 January 2005 December 2004 November 2004 October 2004 September 2004 August 2004 July 2004 June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 December 2003 |