RDFa in DocBook
The DocBook sample points at a DTD called DocbookRDFa.dtd
(shown in Listing 1
), which does three things:
- It references RDFaAttributes.mod, a small file I created by copying the declarations of the special RDFa attributes from Appendix A of RDFa in XHTML: Syntax and Processing, which has a DTD for XHTML+RDFa. (The RDFaAttributes.mod file and all the others mentioned in this article are included in the attached code download.)
- It redefines the DocBook db.common.attributes parameter entity, which defines various pieces of metadata that can be added to nearly any DocBook element, to include the href attribute, the attributes declared in RDFaAttributes.mod, and the namespace declarations needed for the sample document.
- It references a copy of the DocBook 5.0 DTD.
A document pointing at this DTD, like the one shown in Listing 2, can be a valid DocBook document and still store the extra attributes necessary to embed RDF triples.
|Author's Note: In addition to Listing 2 being a DocBook document, another difference between the Primer's examples and those in Listing 2 is that the Primer mentions the mythical "Bob" and "Alice" several times. As the first paragraph of the dbrdfasample.xml DocBook version of the Primer says, it includes numbers after each use of these namesfor example, Bob1 and Bob2to make it easier for you to see exactly which metadata triples get extracted from where in the document.) |
When adding RDFa attributes to HTML, you can add them to nearly any element. However, as you'll see in the RDFa Primer, the span element is the most popular, being HTML's most flexible element. DocBook's phrase element is similarly flexible; using it to store RDFa attributes lets you add triples nearly anywhere in a DocBook document. Still, developers accustomed to the DocBook DTD know that some elements are more logical places for metadata than others. For example, to identify the document editor's employee ID and the workFlowStage values for the mythical MyPubCo publishing company, the bibliomisc child of DocBook's info element is a sensible place, so I stored these triples there.
You will see the predicate and object but not the subject of the mpc:editor and mpc:workFlowStage triples because of the second trick mentioned earlier: when an RDFa parser doesn't see a subject, it assumes that the document itself is the subject. In this case, the parser will know that the document containing this content has an editor identified by http://mypubco.com/empid/53234 and that the document's workFlowStage status is "final review."
|Author's Note: As you compare the DocBook sample with the W3C Primer, note the examples from section 2.3 of the Primer. The W3C's HTML version uses HTML's second-most flexible element, div, to create containers around content that can hold the RDFa metadata attributes. However, DocBook has no equivalent to div (what with this standard's stronger adherence to semantically meaningful names), so I used phrase elements again.|
To demonstrate image metadatasomething valuable to publishersI added a bit more metadata that you won't find in the Primer. Screenshots can be a pain in software documentation because if the software gets upgraded after you take your screenshot, your screenshot may be out of date. So, I added triples to indicate the lastScreenShotDate and softwareRelease associated with the screenshot. Unlike the other metadata examples, these are not metadata about the containing document, but rather about a different resource referenced by the document: http://example.com/bob/photos/sunset.jpg. Because the DocBook inlinemediaobject element can have an info child element for metadata, just as the document itself can, I stored the image metadata there, where info element's about attribute indicates the subject.