Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


What's in a URI? : Page 3

Have you ever wondered about the syntax of a web resource name? Take a look at the semantic web through one of its lowest-level specifications.

Breaking Down Resource Names
How then can such a simple specification cause so much confusion? Naming standards have been known to evoke passion, ire, venom, and feuds! The ones now available are a consequence of the communities that formed around them as well as the technologies that brought them into use. These schemes emerged out of the Internet Engineering Task Force (IETF), one of the first organizations to manage Internet-oriented standards. There are many other naming schemes in existence, but the focus here is on this set of specifications because they are the most relevant to the web. Here is a quick breakdown.

Uniform Resource Identifer (URI)
URI is the über specification that covers all of the other forms. A URI is a sequence of characters (Latin alphabet, digits, and some punctuation) that conforms to a generic syntax. URIs can cover both the naming and locating aspects of identification, which is why it is usually correct (and certainly proper) to consistently refer to URIs. The URI breaks down this way:

scheme ":" hierarchical part [ "?" query ] [ "#" fragment ]

The scheme represents the first segmentation of all possible name spaces. It is used to constrain the rest of the URI within a particular context. An ftp scheme means something quite different from a mailto or http scheme.

The hierarchical part includes a possible authority segment to indicate governance of the remaining portions of the name. This governance usually includes an organization's registered DNS name. The authority portion is optional, however. The remaining part of the hierarchy is considered a path to the ultimate resource being identified.

The query portion is optional and usually provides a way to specify nonhierarchical constraints on a URI.

The fragment section is used typically to make an indirect reference to a secondary resource through a primary resource, but its additional interpretations have been the subject of tremendous debate.

Uniform Resource Locator (URL)
URLs are what everyone (including your parents) knows about naming and finding things on the Internet. Arguably most nontechnical people know only the acronym, and they think it means "web page address." That is only one kind of URL, however. Other examples include mailto:info@somecompany.com or ftp://ftp.somecompany.com. Both URLs uniquely identify a type of thing—that is, email address and ftp site, respectively—but also contain enough information to locate that thing. As they serve the role of identification and addressing, it is legitimate (and generally preferred) to refer to URLs as URIs in the general case.

There is a difficulty lurking just below the surface, however. URLs are dangerous as identifiers unless great thought is put into keeping them from changing or disappearing. Companies and organizations can cease operation, get bought, reorganize, restructure their web pages, lay off people, and so on. All of these factors might cause the structure of a URL to no longer be valid or make sense. While there are many guidelines for extending the lifetime of a URL and rewrite/redirect rules can be used to locate a resource in the face of a change, it is almost an inevitability that they will break over time.

A URI simply identifies a resource in the general sense. That resource might exist as something addressable or it might not. This distinction is what people mean by information and non-information resources. With this property, you can refer to not just the things that you find on the Internet, but also concepts, people, and generic nouns. There is interest in scientific communities to define a naming scheme to refer to things like proteins and chemical structures. You can imagine similar efforts in other academic, government, and commercial sectors allowing participants to refer to the nouns of their domain in their information systems and publications.

URLs are generally insufficient to refer to concepts that do not exist in some form. How do you know when you are indicating the concept and when you are referring to a document describing the concept? This subtle point was the subject of a long, drawn-out discussion that has lasted many years among web standard participants.

Uniform Resource Name (URN)
URNs were intended to solve the identity/address conflation problem by providing a type of identifier that was explicitly a name and not an address. As URIs could serve both forms and URLs were specifically addresses, some people felt there should be a scheme that was just a name as URLs are unlikely to outlive the concepts they indicate. In this form, the name could be permanent and would not be subject to the whims of where documents on the web happened to land.

The goal was to be able to name something even if there was no chance of it ever existing again. URNs were also designed as a means of encoding existing naming schemes under the urn prefix if they could map to the general syntax. The isbn and info schemes are among these alternate schemes that could be represented in the URN space.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date