Semantics and the Evolution of Specialized Languages

Semantics and the Evolution of Specialized Languages

he semantic web, or what is often called Web 3.0, is a collection of technologies that associate meaning with individual data elements that are usually embedded in XHTML files. Having precise semantics for each data element allows collections of highly specialized languages to evolve quickly, and developers can use sets of little languages designed to fit together precisely to create entire applications without having to use procedural languages. These systems are easier to maintain and empower business units to create and update their own applications and business rules. The growth of the semantic web and progression of little languages will have a dramatic impact on the skills needed by software engineers and IT strategists (see the sidebar, ” The Darwinism of Software Languages“).

This discussion provides an overview of several specialized, semantically precise languages that are evolving to fill specific niches in developing web-based applications. Some languages may be familiar already to web developers, but some of the newer languages are being created to support semantic web technologies.

One of the first steps in any software project is to gather requirements. In the past, requirements were gathered by using a variety of techniques such as interviewing subject-matter experts (SMEs), writing lengthy requirements documents, or employing rapid prototyping. These processes suffered from the problem that either the SME isn’t familiar with the software development process, or the software developer doesn’t understand the subtleties of the business. As a result, the systems delivered sometimes don’t match the needs of the users.

A more precise way to build applications requires SMEs and other nonprogrammers to draw pictures using a pallet of well-defined components. These pictures capture requirements by allowing the SME to determine constraints of the components and rearrange these components at any time. The components are precisely defined data elements taken from a central metadata registry where external parties define precisely the semantics of registered items, usually. These structured diagrams are saved or transformed into XML files while retaining the mappings of each data element to the metadata registry. Examples might be XML schema diagrams, workflow diagrams, or a listing of business rules with simple if, then, else sections. Users familiar with creating email rule files that automatically categorize incoming email may already be familiar with simple rule files.

This first stage of gathering requirements is often called the requirements elicitation phase. In this stage you ask the questions, what language should be used to capture these requirements? And, how do I specify what the SMEs want to build without specifying how these systems are built and deployed? The tools or languages to use will vary. If you create a simple web form, you may draw an XML schema that specifies the order of fields and their cardinality (which fields are required, and which fields repeat). As a family, these requirement-capture languages are called declarative languages.

Declarative Language
The term declarative language is used to describe little languages for gathering precise requirements. Although there is still some confusion around the term “declarative” (it is both an attribute and a taxonomy of computer languages), most people agree that these languages focus on the what?not the how or the requirements, and not the run-time implementation. They are by definition more abstract than a simple object class or a function. The process of transforming a requirement into a functioning application (the how) can be delayed. This process is also known as late binding and has an analogue in messaging and polymorphism.

The creation of small, specialized languages is not a new concept. The Unix operating system developed a way for small command-line tools such as sed, awk, sort, and uniq to be piped together to create a chain of operations on a data stream without using intermediate files. Some writers refer to these languages as domain-specific languages (DSLs). Regardless of their names, these languages share common properties:

  • They have small vocabularies, usually fewer than 100 data elements.
  • They favor pattern matching and transformation over explicit, complex if-then or looping logic.
  • They avoid the process of creating new abstractions with private labels and unclear semantics. Using declarative languages usually is a matter of using elements that have publicly defined semantics.
  • They avoid the creation of new functions, classes, or methods that have meaning only to the authors and their peers.
  • They focus on the placement and ordering of their elements in files.
  • International standards bodies can take a very long time to agree on their structure.

Vital to these languages is a community process that is used to define the meaning of each of the data elements in the language. The larger the community that participates in defining a data element, the more widely it will be adopted by tools vendors. The better the tools, the lower will be the support and maintenance costs. The more software developers who become familiar with these standards, the higher there is a probability that future developers will be found that are familiar already with the semantics of the languages and the requirement-gathering systems.

Little Languages
Take a look at some examples of little languages in common use today, and see how they relate to Darwin’s finches:

Using a CSS rules engine for semantic markup. Some HTML developers have created a style of HTML coding called semantic HTML or semantic markup. Semantic HTML attempts to put HTML in the body of a web page and places the presentation aspects of the document into another language: cascading style sheets (CSS). CSS in an interesting case study in that, similar to Darwin’s finches, it evolved to meet a specialized need. CSS is a declarative language rather than a general-purpose language because it specifies a set of rules in an abstract way. CSS specifies a set of if statements (a pattern match) and applies the appropriate attributes in the body of an HTML document if the pattern match is true. Here is a sample CSS:

/* sample CSS */body {  font-family: Arial, Helvetica, sans-serif;  font-size: 75%;  margin: 0;  padding: 0;  width: 1000px;}h1 {  color: blue;  padding: 0 15px;}

The structure of this CSS file is a set of if-then rules. The first rule sets properties for all HTML tags inside the body. The second rule applies properties to only HTML tags under heading level 1 (h1). Note that both the body and h1 levels have a padding attribute, but because h1 is more specific its rule will override the body‘s padding property. Maintaining CSS files and the entire web site is much easier when you use defaults and overrides.

What is important about the CSS approach is not the syntax of the file (you could use XML format), but the fact that CSS leaves out many constructs that would make CSS harder to learn. If you learn how to use the pattern-matching rules, you can write a CSS file. There are fewer than a dozen ways to match the patterns, which means that a web designer can focus on the rules of the CSS file without being concerned as to how the HTML is created.

Like Darwin’s finches, CSS has evolved from precise and consistent requirements. Web pages live in trees of data, and you can apply rules to subsets of these trees, regardless of the specific HTML tags that are used. When there are large numbers of people and organizations trying to solve similar problems, small yet semantically precise languages will evolve to fill a niche. XML Schema for capturing document requirements. The XML Schema language is also cited frequently as a critical piece in declarative systems. XML schemas are used not only to validate documents transmitted between computers and organizations, but they also serve as mini models in model-driven development (see the sidebar, “Business Requirement Capture“).

XForms for presenting forms. XForms is one of the newest W3C standards that is gaining widespread adoption. XForms has been brewing at the W3C for over five years. The specifications were delayed by the need to fit precisely with XPath, XML Schema, and CSS. XForms appears to work most efficiently when the development environment utilizes REST interfaces and native XML databases. The XForms specification contains only 21 data elements, and because XForms works with XML Schema, a small Extensible Stylesheet Language Transformations (XSLT) file can transform your XML schema directly into an XForms application.

XForms and Asynchronous JavaScript and XML (AJAX) present an interesting contrast of what happens when you use a procedural approach to add new features to existing systems. There are many books written about technologies such as AJAX because although useful, the implementation is complex. A web programmer must define all of the details of how updates are made to a web application. In contrast, by using XForms the functionality of asynchronous updates is defined by a single XForms data element: submit. The code to perform rich behavior is brief, concise, standardized, and abstract.

XSLT for transformation rules. When you think of the king of transformation using pattern matching for XML, XSLT is the first language that comes to mind. XSLT is a widely adopted standard that is in use in many software development environments and executes directly in most browsers. It allows structured XML data to transform into almost any other XML representation. XSLT relies on a series of templates that match pattern rules. If a pattern matches and a rule fires, fragments of XML are placed in an output tree. The exact order in which the rules fire is determined by the structure of the incoming data.

XSLT is powerful in that simple transformations such as adding, removing, or changing XML structures (even on large data sets) is just a few lines of XSLT code. Good programmers using XSLT don’t create subroutines with names that must be chosen carefully to document meaning to future developers. They rely on the fact that both input and output formats have meaning to a wide audience. Here is a list of sample XML transform elements:

  • call-template
  • choice
  • element
  • for
  • if
  • import
  • output
  • param
  • sort
  • stylesheet
  • template
  • text
  • value-of
  • when
  • with-param

SQL for data selection from tables. Structured Query Language (SQL) began with a small vocabulary in which basic operations of data selection from tables were required: SELECT, INSERT, UPDATE, and DELETE. Although many SQL languages have been extended to include features such as stored procedures, looping, and conditional expressions, the core functionality of the language is still very small.

XQuery, XML’s SQL. XQuery is a simple language that contains only five important tags and header tags for declarations and imports. XQuery can perform all of the same operations that standard SQL can perform (selection, joins, and so on), but XQuery works on trees of data rather than tables. XQuery’s five main tags are for, let, order, were, and return, which are known collectively as a FLOWR (pronounced “flower”) expression. XQuery also has functions for selecting data from collections and documents.

XPath for data selection and functions. XSLT, XForms, and XQuery all depend on data-selection expressions using the XPath language. Although XPath is a small language with just a few key concepts it also has a rich library of functions for manipulating strings, numbers, and dates. Many declarative systems leverage XPath data selection techniques, and XPath’s use is relatively consistent across these languages. Nonetheless, familiarity with XPath seems to be a vital skill in the world of declarative languages, even though it is not taught in most computer-science curriculums today. Lack of knowledge of XPath remains a common reason many people give for being reticent about moving toward declarative systems.

Getting the Picture
Resource Description Framework, the language of graphs. The foregoing discussion has involved only those languages that manipulate XML tree structures. The semantic web requires more than just trees in stand-alone documents. The semantic web depends on graphs and inference to allow two independent web pages to be joined together to create new information. Inference is a fundamental tenant to the semantic web that allows graphs extracted from different web pages to be merged, just as SQL merges data from tables. Graphs are best stored in nodes and the connections among them. In Resource Description Framework (RDF) nodes are resources, and the links among the nodes are properties.

This (node-link-node) arrangement is one of the main structures for representing graphs on the semantic web, and RDF is the language for doing this representation. RDF has unfortunately developed a reputation for being difficult to learn and difficult for humans to read. Much of this reputation has to do with the fact that for RDF joins to work a sometimes very lengthy URI (or URL) must represent each node and arc. RDF and its extensions?RDF Schema (RDFS)?also have a very small vocabulary of terms and are easy to create using XSLT. And RDF has its own query language: SPARQL.

RDF is beyond the scope of this discussion, but keep in mind that it is an important tool for representing knowledge on the semantic web.

Simple knowledge organization system. Although most of the semantic web community has agreed that RDF is critical to web-based search and inference, the jury is still out on exactly how higher-order logic, rules, and other systems will evolve. Standards such as Web Ontology Language (OWL) have gained a great deal of acceptance in some communities, but there are still opportunities for small declarative vocabularies to be very useful on top of the XML and RDF infrastructure.

However, there is a newer standard that has been gaining popularity for storing structured business vocabularies: simple knowledge organization systems (SKOS). Unlike other W3C standards that have reached the recommendation phase, SKOS is still in the working-draft stage.

SKOS represents structured, controlled vocabularies: thesauri, classification schemes, taxonomies, or subject-heading systems. SKOS has wide support for many corporate metadata librarians trying to classify and maintain data elements within large organizations. It is different than many XML standards because it assumes an RDF structure and allows RDF inference and query technologies to be used. Table 1 provides a list of SKOS RDF classes and properties.

Table 1. SKOS RDF Classes and Properties

Class or Property NameTypePurpose
ConceptClassDeclare that a resource is a concept or conceptual resource
Property, labeling: alternative and preferred lexicalAssign preferred or alternative lexical labels to resources
altSymbolprefSymbolProperty, labeling: alternative and preferred symbolicAssign preferred or alternative symbolic labels (images) to concepts
HiddenLabelProperty, labeling: hidden lexicalAssign hidden lexical labels to resources to make character strings accessible for text-based indexing and searching applications
Property, documentationAdd human-readable documentation to a concept’s description
Property, semantic relationDeclare semantic?or paradigmatic?broad, narrow, or associative relationships between concepts
ConceptSchemeClass, concept schemeDeclare that a resource is a concept scheme
Property, concept schemeDeclare that a concept is part of a concept scheme or declare a link between them
Property, concept scheme: subject indexingIndex information resources on the web by subject
CollectionClass, meaningful collection of conceptsAssign lexical labels to specific collections
memberProperty, meaningful collection of conceptsWith the Collection class, assign lexical labels to specific collections
OrderedCollectionClass, meaningful collection of concepts (subclass of Collection)Define ordered collections of concepts
memberListProperty, meaningful collection of conceptsWith the OrderedCollection class, define ordered collections of concepts
subjectIndicatorProperty, published subject indicatorsDeclare links between concepts and human-readable documents that describe the content

Source: W3C: SKOS Core Guide, W3C Working Draft 2, November 2005

Declarative Patterns
After using many of these languages some common patterns emerge. If you follow the format of design patterns, each of these reoccurring problems are given names to allow discussion of declarative topics with high bandwidth and precise semantics:

Semantic drawings ? One of the main aspects to business-unit empowerment is allowing staff outside of the IT department the ability to precisely capture and update system artifacts. Putting graphical front ends on languages with small vocabularies is significantly easier then putting graphical front ends on general programming languages that excel at creating new abstractions. Once nonprogrammers learn a small list of symbols, such as a solid line for required elements and a dashed line for optional elements, they are empowered to maintain their own requirements. This ability gets the IT department out of the business of maintaining business logic.

List selection for nonprogrammers ? The concept of narrowing the selection possibilities for nontechnical users seems to occur frequently when using declarative systems. Data selection can follow many rules based on context, role, and function. Functions like GoogleSuggest demonstrate the power of using knowledge of a prior user’s behavior to create a prioritized list of options when a current user is entering or selecting text.

Rules in forms ? In addition to the use of full drawing programs as used in XML schemas and workflow diagrams, business rules are created and maintained using simple web forms. Setting up email rules is a good example. These forms can constrain the choices the users make to create precise business rules that are easy for nonprogrammers to maintain. Rules engines can take a simple linear list of rules to check in sequence, or they can be stated in complex decision trees. Regardless, programmers need not be involved.

No replacement for data stewardship ? If users don’t understand or cannot communicate the meaning of data elements, few tools will help. A common expression, “a fool with a tool is still a fool,” applies. Data stewardship must be taught carefully to each business unit, and data-steward champions must be identified. Data governance is still one of the most active areas of information systems research and development.

Semantics is king ? Once you have precise data element definitions for your internally controlled vocabularies, the metadata registries that store these vocabularies can also contain other useful information. Application developers can utilize this store to avoid duplication of business logic. A common phrase heard frequently in the semantics community is, “a little semantics goes a long way.” This phrase implies that by providing small, incremental semantic baby steps, large gains in business productivity can be achieved.

Getting Up to Speed
Most software development training today omits the discussion of declarative languages in favor of mainstream procedural languages, thus excluding the incredible benefits of externally defined semantics. However, this omission isn’t necessarily the destiny of all computer science graduates.

The W3C has recently organized the first of a series, “Workshop on Declarative Models of Distributed Web Applications.” The main target of this workshop was to decrease both development and maintenance costs of building web applications. Hopefully, these workshops and their progeny will continue to form a basis for future web development training.

There are also a small number of communities starting to build fully declarative web applications and seeing strong benefits of these languages. Many people in the XForms and XQuery communities are also starting to use a stronger declarative style and promote a full suite of declarative approaches to web development challenges. Integrating XML Pipelines, document workflow, and rules engines into the declarative stack is perhaps the next target for integration into the declarative space.

From observing how specialized languages are rapidly evolving, software developers can certainly expect to see more little languages in the near future. When approaching business problems, ask yourself if you want to spend your career updating business rules or empowering business units to do it themselves. If you are in the empowerment camp, look carefully at declarative languages.

When asked directly, IT managers admit that empowerment is a good thing, but they fear the loss of control. Many IT strategies do not focus on the underlying technologies and skills required to make the leap from procedural to declarative systems.

As a developer or IT strategist, consider the impact of the semantic web infrastructure’s growth and its potential role empowering business units and overall IT productivity. If you’re interested in becoming a leader in the movement toward declarative systems, let this discussion be a road map to this evolution of software development languages that could very well transform the industry.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist