Information Technology One-Stop Portal: Introduction to Semantic Web

Web 3.0 — the Semantic Web — is what folks are calling the third major wave of the Web. Interestingly, the principal inventor of the Web\ itself, Tim Berners-Lee, doesn’t much favor the idea of versioning the Web, and he views the Semantic Web as more aligned with his original vision anyway — which means that we’re actually just now seeing the evolution of a Web he was thinking about almost 20 years ago. Nova Spivak, an entrepreneur and Web visionary, has a compelling chart, similar to the one shown in Figure 1-1, that he uses to describe the Web 3.0 phenomenon.

You can see the clear progression of technology from the Personal Computing era, to the first Web 1.0 of pages and documents, to the Web 2.0 era of social networking, and to the Web 3.0 era of the Semantic Web and data networking (Pollock 11).
The modern origins of the Semantic Web can be traced to Netscape and the Defense Departments of the United States and Europe. In 1998, Tim Bray and Ramanathan Guha built a metadata language called MCF (Meta Content Framework) for XML to help Netscape describe content ratings of Web pages.
Soon thereafter, the World Wide Web Consortium (W3C) looked to create a general-purpose metadata language called RDF (Resource Description Framework). This new language was largely based on the original MCF specification by Guha and Bray (Pollock 14).

Tim Berners-Lee has a two-part vision for the future of the web. The first part is to make the web a more collaborative medium. The second part is to make the web understandable and thus processable by machines (Dakonta, et. Al 1).

To achieve this semantic understanding “The first step is putting data on the web in a form that machines can naturally understand or converting it to that form. This creates what I call semantic web – a web of data that can be processed directly or in directly by machines.” (Tim Berners-Lee, Weaving the Web 1999).

Most of today’s Web content is suitable for human consumption. Typical uses of the Web today involve people’s seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material.
Keyword-based search engines such as Yahoo and Google are the main tools for using today’s Web. It is clear that the Web would not have become the huge success it is, were it not for search engines. However, there are serious problems associated with their use: High recall, low precision, Low or no recall, Results are highly sensitive to vocabulary and Results are single Web pages.
An alternative approach is to represent Web content in a form that is more easily machine processable and to use intelligent techniques to take advantage of these representations. We refer to this plan of revolutionizing the Web as the Semantic Web initiative. It is important to understand that the Semantic Web will not be a new global information highway parallel to the existing World Wide Web; instead it will graduall evolve out of the existing Web (Antoniou, et. Al 3).
The Semantic Web is a web that is able to describe things in a way that computers can understand (www.w3schools.com).
The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Lee, Hendler, Lassila 3).
Important technologies for developing the Semantic Web are already in place: eXtensible Markup Language (XML), the Resource Description Framework (RDF) and the Onthologies.
XML lets everyone create their own tags—hidden labels such as <zip code> or <alma mater> that annotate Web pages or sections of text on a page. XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean (Lee, et. Al 5).
A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. (Lee, et. Al 5).

The Resource Description Framework (RDF) is a W3C standard for describing Web resources, such as the title, author, modification date, content, and copyright information of a Web page. RDF identifies things using Web identifiers (URIs), and describes resources with properties and property values. The combination of a
Resource, a Property, and a Property value forms a Statement (known as the subject, predicate and object of a Statement) (www.w3schools.com). The OWL Web Ontology Language is designed for use by applications that need to process the content of information rather than just presenting information to humans.
OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with formal semantics. OWL has three sublanguages: in order of decreasing expressiveness, they are OWL Full, OWL DL, and OWL Lite (www.ibm.com).
The development of the Semantic Web proceeds in steps, each step building a layer on top of another.
Figure 1.3 shows the “layer cake” of the Semantic Web (due to Tim Berners- Lee), which describes the main layers of the Semantic Web design and vision. At the bottom we find XML, a language that lets one write structured Web documents with a user-defined vocabulary. XML is particularly suitable for sending documents across the Web. RDF is a basic data model, like the entity-relationship model, for writing simple statements about Web objects (resources). The RDF data model does not rely on XML, but RDF has an XML-based syntax. Therefore, in figure 1.3, it is located on top of the XML layer.
RDF Schema provides modeling primitives for organizing Web objects into hierarchies. Key primitives are classes and properties, subclass and sub property relationships, and domain and range restrictions. RDF Schema is based on RDF.
RDF Schema can be viewed as a primitive language for writing ontologies. But there is a need for more powerful ontology languages that expand RDF Schema and allow the representations of more complex relationships between Web objects. The Logic layer is used to enhance the ontology language further and to allow the writing of applicationspecific declarative knowledge.
The Proof layer involves the actual deductive process as well as the representation of proofs in Web languages (from lower levels) and proof validation. Finally, the Trust layer will emerge through the use of digital signatures and other kinds of knowledge, based on recommendations by trusted agents or on rating and certification agencies and consumer bodies. Sometimes “Web of Trust” is used to indicate that trust will be organized in the same distributed and chaotic way as the WWW itself.

Being located at the top of the pyramid, trust is a high-level and crucial concept: the Web will only achieve its full potential when users have trust in its operations (security) and in the quality of information provided. This classical layer stack is currently being debated. Figure 1.4 shows an
alternative layer stack that takes recent developments into account. The main differences, compared to the stack in figure 1.3, are the following:
• The ontology layer is instantiated with two alternatives: the current standard Web ontology language, OWL, and a rule-based language. Thus an alternative stream in the development of the Semantic Web appears.
• DLP is the intersection of OWL and Horn logic, and serves as a common foundation.
The Semantic Web architecture is currently being debated and may be subject to refinements and modifications in the future (Antoniou et. Al 17).

Information Technology One-Stop Portal

Sunday, October 21, 2012

Introduction to Semantic Web

No comments:

Post a Comment