|
The Semantic Web: a long and somewhat convoluted definition.
This1 is an attempt to define and explain the semantic web for a lay audience, though it should be remembered that I am a member of that audience myself... It is a commonplace that we are drowning in information, and nowhere is this "information overload" more apparent than in scientific research. The National Library of Medicine's literature database, PubMed, is searched more than 60 million times a month and contains almost 19 million records from more than 5300 journals -- still only a fraction of the approximately 15,000 active, refereed, scientific journals listed in Ulrich's Periodicals Directory2. GenBank, the world's foremost repository of nucleic acid sequence information, contains roughly 100 billion bases in 100 million sequence records, and is growing at an exponentially increasing rate that is currently in excess of 50,000 records per day. Unlike PubMed and GenBank, which are cross-disciplinary databases, the Nucleic Acids Research Molecular Biology Database Collection is a carefully curated list of high-value specialist resources; it currently lists 1170 distinct, largely non-overlapping databases. I could go on, but you get the point3. As things stand, researchers talk to researchers and use computers to facilitate that conversation; what we need is for computers to be able to talk to computers. To cope with (literally) inhuman volumes of data, we need that data to start making sense to machines, so that they can do something no human brain can do: process all of it. We need to make it possible for machines to transfer richly interconnected data among themselves, mix and remix it, generate new connections, filter it, process it, transform it, and output the results to formats and interfaces that make sense to human brains -- substrates on which we can carry out the sorts of synthetic, creative thinking that computers cannot do. We need a man-machine partnership in which both partners can do what they do best, and that means we need the semantic web. The semantic web is the outcome of processes and frameworks with which computers can manipulate data in a way that makes it accessible by human brains. It is built on the standards and metadata -- information about data -- that are required for automated data exchange and processing, which in turn is required to create machine-generated, human-scale summaries, skeletons, outlines and other representations of, and interfaces with, the entire knowledge corpus. Here's an example. Human brains have no trouble processing the following data: Another reason for opening access to research. Wilbanks J. BMJ. 333:1306-8 (2006). To you, that's a reference; but to a computer, it's just a string of text. What a computer needs is information (metatada) about each substring: Title: Another reason for opening access to research. Now the computer "knows" which letters identify John, which constitute the title of the article, and so on. If you set the standards up properly, it even "knows" that Wilbanks is the surname and J the first initial, and so on into ever finer grained properties. Now imagine you had, oh, say, about 19 million such records. A human brain cannot do anything useful with such a database, but a computer can -- which is exactly why we can ask PubMed human-scale questions like "how many papers did J Wilbanks publish between 2000 and 2009?", or "show me all the papers with "access to research" in the title". Now multiply that -- the ability to ask human-scale questions of a mass of information far too large for any human brain to absorb or process -- by thousands of different types of information (text, gene sequences, chemical formulae, microarray results, etc etc), millions of individual records within each data type, recorded in thousands of journals and databases, produced by hundreds of thousands of laboratories, libraries and garage hackers. Imagine what we could learn if we could query all of that information on a human scale. There: that's a glimpse of the potential power of the semantic web. ------------- 2 tickboxes = active, refereed, scholarly/academic; search = LC Classification Number for [Q* OR R* OR S* OR T* OR U* OR V*] 3In fact, I'm always on the lookout for more good examples of the "data deluge" and the rapid progress of science and tech; post 'em (in comments) if you got 'em. Comments Thanks for that Ralf. Sorry your comment spent so long in spam limbo there! (I'd put you on a white list but MT doesn't have one. Also, you'd find a way to make mischief if I did that.) Post a comment |
RSS Feed
Links: (formerly Malice Aforethought) me spousal unit Bloglines account Simpy account Connotea account OpenWetWare userpage googlebombs for good Roe; Wade; Roe v Wade abortion Jew Seldovia Herald blogroll: Archives: August 2010 June 2010 April 2010 March 2010 February 2010 January 2010 October 2009 July 2009 June 2009 May 2009 April 2009 March 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 February 2006 January 2006 December 2005 November 2005 October 2005 September 2005 August 2005 July 2005 June 2005 May 2005 April 2005 March 2005 February 2005 January 2005 December 2004 November 2004 October 2004 September 2004 August 2004 July 2004 June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 December 2003 |
Re footnote 3 there's an example of big, and increasing, data at
http://www.bordercountiesadvertizer.co.uk/news/Knockin-telescope-in-stargazing-experiment.5234253.jp
"The e-MERLIN fibre network will carry as much data as the rest of the UK internet combined, enabling astronomers to see in a single day what would have previously taken us three years of observations."
more on e-Merlin here
http://news.bbc.co.uk/2/hi/science/nature/7828174.stm
and at wikipedia of course
http://en.wikipedia.org/wiki/MERLIN
Ralf