he Web Hypertext Application Technology Working Group (WHATWG) is formalizing a specification known as HTML5 or Web Applications 1.0 that should standardize some of the ambiguities and disconnects that have emerged with HTML and related web technologies.
This article discusses some of the features and functionality of the proposed HTML5 specification.
HTML5 is a specification being formalized by the Web Hypertext Application Technology Working Group (WHATWG) that defines concrete language syntax for an API that can describe documents and applications. The WHATWG specification incorporates both the existing HTML 4.01 and XHTML 1.0 features, and also introduces new items, including:
- new layout elements
- programming changes to the Document Object Model (DOM)
- updated Web Forms
- server-sent DOM events
- dynamic graphics capabilities
WHATWG is a growing community of browser vendors, web developers, and other parties interested in the design and implementation of the next generation of HTML and related technologies. WHATWG’s primary concern is to enable authors to write and deploy applications over the World Wide Web.
WHATWG seeks to invigorate web application development by extending HTML so that it is suitable for expressing the semantics of now-ubiquitous applications such as forum sites, auction sites, search engines, and online shopping sites. WHATWG plans to facilitate such development in two ways: by defining an abstract language for describing documents and applications, and by defining APIs for interacting with in-memory representations of instances of the abstract language.
Two concrete syntaxes that use the abstract language defined in the WHATWG specification are also defined in the specification. These concrete syntaxes are:
- HTML5?This syntax defines a custom parsing model inspired by SGML. It uses the familiar text/html MIME type. HTML5 syntax is compatible with legacy web browsers and is the recommended format.
- XHTML5?This syntax is based on XML and contains elements from the HTML namespace. It uses the MIME type application/xml or application/xhtml+xml. This syntax is discouraged since XML has a much stricter syntax than the proposed HTML5 syntax.
The diagram in Figure 1 illustrates HTML5’s parsing model
|Figure 1. HTML5 Parsing Model: The diagram shows how HTML5 tokenizes incoming content into a tree structure, then creates a DOM and executes any script in the document.|
Implementations are encouraged to support both XHTML5 and HTML5, although they’re free to support only one.
The Need for HTML5
Apple, Mozilla, and Opera became increasingly concerned about the W3C’s direction (or lack of direction) with XHTML, their lack of interest in HTML, and their perceived disregard for the needs of web application developers. In response, these organizations took it upon themselves to address these concerns.
Markup for documents on the World Wide Web has always been some incarnation of HTML. Although it was originally designed as a language for semantically describing scientific documents, HTML was adopted for general use and was rapidly extended during the 1990’s. It’s now used to describe most documents transmitted across the web.
HTML worked well for publishing static web pages. But many modern web documents aren’t individual static pages at all; instead, they’re partial pages or one page among many that, collectively, compose a web application. The current HTML specification inadequately addresses the entire area of web applications (session-oriented conversations between web clients and web server components). The WHATWG specification is an attempt to correct this situation and, at the same time, update the HTML specifications to address other issues that have annoyed web developers over the last few years.
Currently Apple, Mozilla, and Opera are the only browsers offering support for the development of HTML5. However, HTML5 is being developed with “IE compatibility in mind.”
HTML5’s Relationship to HTML 4x and XHTML
The WHATWG specification is intended to replace HTML4 and XHTML1 and introduce a new version of the DOM, called the DOM Level 2 HTML Specification API.
One primary goal for HTML5 is to improve both HTML and XHTML, while still keeping structure and syntax as simple as possible. WHATWG has facilitated this by specifying just three fundamental requirements that an HTML5-compliant document must meet:
- DTD requirement?HTML5 uses the DTD </span>, which user agents should interpret as meaning that they should operate in “Standards Mode” (see http://www.whatwg.org/specs/web-apps/current-work/#the-doctype).
- MIME Type requirement?The WHATWG specification specifies that all documents sent as “text/html” are HTML5, in like fashion with HTML 4.01 and XHTML 1.0.
- Well-formed document requirement?HTML5 defines a well-formed document according to the requirements specified in section 4 Appendix C of the XHTML 1.0 specification.
The WHATWG specification defines some interesting new elements:
- section?a generic grouping of content in a document or application
- article?a section of a page that consists of a composition that forms an independent part of a document, page, or site, such as a forum post, newspaper article, etc.
- aside?a section of a page that consists of content that is slightly related to the content around the element, but could be considered separate from that content, such as a sidebar
- dialog?a conversation involving an explicit talker/speaker represented by a dt element and a discourse represented by a dd element
- footer?represents the footer for the section to which it applies and contains information such as the author, copyright data, and related links
- header?represents the header of a section used to denote summaries, outlines, etc.
- nav?a section of a page that links to other pages or to parts within the page
The WHATWG specification defines the following categories for HTML elements:
- Metadata elements?used to represent metadata in a document’s head element. This includes elements such as title, base, and link
- Sectioning elements?used to divide a page into sections. This includes elements such as body, section, nav, and article
- Block-level elements?used for structural grouping of page content. This includes elements such as blockquote, section, p, and div.
- Strictly inline-level content?text, embedded content, and elements that annotate text without introducing structural grouping. This includes elements such as a, meter, and img
- Structured inline-level elements?block-level elements that can also be used as inline-level content, such as ol, blockquote, and table
- Interactive elements?elements that can be activate by a user agent via things like a mouse or keyboard. This includes elements such as a, button, and radio input elements.
- Form control elements
- Miscellaneous elements
The following existing HTML4 elements are not defined in HTML5:
- acronym (use instead)
- applet (use instead)
- noscript (only in XHTML)
For more information, check out the complete cross-referenced list of HTML5 tags defined by the WHATWG specification.
UI Widgets/Components of HTML5
HTML5 introduces new layout components, including such items as a canvas element, a calendar control, an address card, a datagrid, progress meters, and other components.
The canvas element embodies a bitmap canvas, which you can use to perform dynamic drawing tasks, such as rendering pie charts, graphs, and other graphical items.
HTML 5 Example
Figure 2 illustrates the canvas example as it appears in Firefox:
Planned Modifications to the DOM
In addition to element changes, the WHATWG specification introduces features to the DOM that are intended to simplify authoring web-based applications. The specification defines features present in the DOM as “DOM5 HTML.” DOM5 HTML consists of DOM Core Document nodes and DOM Core Element nodes, along with text nodes and other content.
All Document objects found in components that implement the WHATWG specification must also implement a new HTMLDocument interface, along with any document-level interface of any other namespaces found in the document that a given user agent (UA) supports.
The nodes representing HTML elements in the DOM must implement?and expose to scripts?the interfaces listed for them in the relevant sections of the WHATWG specification.
You can find a comprehensive list of requirements here.
Planned API Changes in HTML5
For documents in the HTML namespace, and for HTML elements in HTML documents, certain APIs defined in DOM3 Core become case-insensitive or case-changing, as sometimes defined in DOM3 Core, and as summarized or required below.
- Element.tagName, Node.nodeName, and Node.localName?These attributes will return tag names in all uppercase; regardless of the case with which they were created.
- Document.createElement()?This method will convert the argument to lowercase before creating the element. Also, the element created will be in the HTML namespace.
- Element.setAttributeNode()?An attribute node will be converted to lowercase before it is set on an HTML element.
- Element.setAttribute()?An attribute will be converted to lowercase before it is set on an HTML element.
- Document.getElementsByTagName() and Element.getElementsByTagName()?These methods will perform case-insensitive comparisons when looking at HTML elements, and case-sensitive comparisons for non-HTML elements.
- Document.renameNode()?If the namespace specified in the new name is the HTML namespace, then the new name will be converted to lowercase before renaming is performed.
HTML 5 vs. Proprietary Technologies
Developing web applications with HTML5 allows a developer to take advantage of the ubiquitous presence of HTML and XHTML. However, there are a number of web-application technologies that are available today or are soon to be released. Among these are Java, .NET, and Adobe’s Apollo.
Java and .NET are comprehensive application platforms for developing many types of applications. Web applications are a small segment of the capabilities of each. With Java and .NET, additional frameworks such as Struts, Spring, and DotNetNuke are used to build web/enterprise applications.
The primary advantage of using proprietary technologies over HTML5 is the power afforded by proprietary non-conformance to standards. The disadvantage of using proprietary technologies is the reliance on an additional runtime component that must be installed before the applications will run.
Planned Completion Date for HTML5
The details are still being worked out. Different parts of the specification are at different maturity levels. The plan is to indicate the maturity level on a per-section basis. Some sections are already relatively stable and there are implementations that are already quite close to completion, and those features can be used today (e.g. ). However, according to the current version of the specification, HTML5 won’t be completely finished, with test suite and interoperability implemented, for 15 years! Yes, that’s correct?15 years!
You can follow the official Web Applications 1.0 specification published by the Web Hypertext Application Technology Working Group (WHATWG) to keep up with the maturity level of the various sections.
The WHATWG community publishes a FAQ, blogs, Wiki, forums, etc.
Simon Pieters’ page lists all the HTML5 elements and attributes.
Berea Street publishes information about HTML 5.
Anne van Kesteren maintains a blog on W3C, WHATWG, HTML, CSS, DOM, XML, HTTP and more.
Final Thoughts and Predictions
The specification being formalized by the Web Hypertext Application Technology Working Group (WHATWG) known as HTML5 or Web Applications 1.0 will hopefully standardize some of the problems that have arisen from the use of HTML for such things as shopping sites, auction sites, and others. This is a huge undertaking with a far-reaching deadline (15 years). These two issues might prove to be major stumbling blocks for WHATWG.
One item in WHATWG’s favor is that backwards compatibility is reportedly a primary concern. Therefore, migration from HTML4 or XHTML1 to HTML5 should in most cases be straightforward.
On the other hand, one item working against WHATWG is the announcement made by the W3C in October of 2006 that it would charter a new HTML Working Group to incrementally improve HTML. Some of these improvements include extending HTML forms to become a superset of HTML and a subset of XForms.
One very aggressive milestone for W3C’s charter is that it should reach the recommendation level by 2008 and complete by end-of-year 2010. This might create enough FUD to divide followers of the WHATWG specification. To help combine the two efforts, the HTML Working Group has proposed a call for a formal relationship with WHATWG. This will prove interesting as it develops.