The Semantic Web: a long and somewhat convoluted definition.

This1 is an attempt to define and explain the semantic web for a lay audience, though it should be remembered that I am a member of that audience myself...

It is a commonplace that we are drowning in information, and nowhere is this "information overload" more apparent than in scientific research. The National Library of Medicine's literature database, PubMed, is searched more than 60 million times a month and contains almost 19 million records from more than 5300 journals -- still only a fraction of the approximately 15,000 active, refereed, scientific journals listed in Ulrich's Periodicals Directory2. GenBank, the world's foremost repository of nucleic acid sequence information, contains roughly 100 billion bases in 100 million sequence records, and is growing at an exponentially increasing rate that is currently in excess of 50,000 records per day. Unlike PubMed and GenBank, which are cross-disciplinary databases, the Nucleic Acids Research Molecular Biology Database Collection is a carefully curated list of high-value specialist resources; it currently lists 1170 distinct, largely non-overlapping databases. I could go on, but you get the point3.

As things stand, researchers talk to researchers and use computers to facilitate that conversation; what we need is for computers to be able to talk to computers. To cope with (literally) inhuman volumes of data, we need that data to start making sense to machines, so that they can do something no human brain can do: process all of it. We need to make it possible for machines to transfer richly interconnected data among themselves, mix and remix it, generate new connections, filter it, process it, transform it, and output the results to formats and interfaces that make sense to human brains -- substrates on which we can carry out the sorts of synthetic, creative thinking that computers cannot do.

We need a man-machine partnership in which both partners can do what they do best, and that means we need the semantic web.

The semantic web is the outcome of processes and frameworks with which computers can manipulate data in a way that makes it accessible by human brains. It is built on the standards and metadata -- information about data -- that are required for automated data exchange and processing, which in turn is required to create machine-generated, human-scale summaries, skeletons, outlines and other representations of, and interfaces with, the entire knowledge corpus.

Here's an example. Human brains have no trouble processing the following data:

Another reason for opening access to research. Wilbanks J. BMJ. 333:1306-8 (2006).

To you, that's a reference; but to a computer, it's just a string of text. What a computer needs is information (metatada) about each substring:

Title: Another reason for opening access to research.
Author: Wilbanks, J
Journal: British Medical Journal
Issue: 333
Pages:1306-8
Date: 2006

Now the computer "knows" which letters identify John, which constitute the title of the article, and so on. If you set the standards up properly, it even "knows" that Wilbanks is the surname and J the first initial, and so on into ever finer grained properties.

Now imagine you had, oh, say, about 19 million such records. A human brain cannot do anything useful with such a database, but a computer can -- which is exactly why we can ask PubMed human-scale questions like "how many papers did J Wilbanks publish between 2000 and 2009?", or "show me all the papers with "access to research" in the title".

Now multiply that -- the ability to ask human-scale questions of a mass of information far too large for any human brain to absorb or process -- by thousands of different types of information (text, gene sequences, chemical formulae, microarray results, etc etc), millions of individual records within each data type, recorded in thousands of journals and databases, produced by hundreds of thousands of laboratories, libraries and garage hackers. Imagine what we could learn if we could query all of that information on a human scale.

There: that's a glimpse of the potential power of the semantic web.

-------------
1 This entry started life as an early draft of a letter in support of John Wilbanks' application for a TED fellowship. We didn't get enough signatures in time, so it never was even sent. My apologies to those people who did sign on; if John re-applies I'll try again, with better planning!

2 tickboxes = active, refereed, scholarly/academic; search = LC Classification Number for [Q* OR R* OR S* OR T* OR U* OR V*]

3In fact, I'm always on the lookout for more good examples of the "data deluge" and the rapid progress of science and tech; post 'em (in comments) if you got 'em.


Comments

Re footnote 3 there's an example of big, and increasing, data at
http://www.bordercountiesadvertizer.co.uk/news/Knockin-telescope-in-stargazing-experiment.5234253.jp

"The e-MERLIN fibre network will carry as much data as the rest of the UK internet combined, enabling astronomers to see in a single day what would have previously taken us three years of observations."

more on e-Merlin here
http://news.bbc.co.uk/2/hi/science/nature/7828174.stm
and at wikipedia of course
http://en.wikipedia.org/wiki/MERLIN

Ralf

Comment number: 019556   Posted by: Ralf Muhlberger on May 11, 2009 10:08 PM from IP: 130.102.79.48

Thanks for that Ralf. Sorry your comment spent so long in spam limbo there! (I'd put you on a white list but MT doesn't have one. Also, you'd find a way to make mischief if I did that.)

Comment number: 019559   Posted by: bill on May 13, 2009 10:23 PM from IP: 97.120.196.95

Post a comment

















RSS Feed

CC0
To the extent possible under law, I have waived all copyright and related or neighboring rights to this weblog. This work is published from the United States. Further information.


Links:
(formerly Malice Aforethought)
me
spousal unit
Bloglines account
Simpy account
Connotea account
OpenWetWare userpage
monthly irregular column on 3QuarksDaily


Please sign the petition in support of the European Commission's proposed Open Access Self-Archiving Mandate

googlebombs for good
Roe; Wade; Roe v Wade
abortion
Jew
Seldovia Herald


blogroll:

Archives:
March 2010
February 2010
January 2010
October 2009
July 2009
June 2009
May 2009
April 2009
March 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003









Design thrown together haphazardly by frykitty.
Powered by the inimitable MovableType.