After reviewing the method used to diagram the syntax of SQLite commands a couple of years ago, I was able to create what I believe is a better method for syntax diagramming. I recently revised my method, and would like to share it with the community.
Syntax diagrams are useful for visually conveying syntax, especially for long SQL commands with nested sets of optional elements. The existing format, known as railroad diagrams, has been around since the 1970s in various forms1 and is still used by language and software developers, such as for SQLite and JSON. But I found that railroad diagrams, while appealing due to their simplicity, have a number of problems.
My method doesn't require any component shapes that aren't already in most diagramming software. It's instead just a new way to construct diagrams that, I believe, are improved through efficiency and features.
Here's an example of a railroad diagram for those who aren't familiar with them:
Figure 1. A railroad diagram example (SQLite CREATE TABLE). From sqlite.org.
A Brief Comparison
Here are a few of the problems with railroad diagrams:
- The "carriage-return" lines continuing the graph from the end of one horizontal sequence to the beginning of the next are clumsy and inefficient.
- The long lines required for looping around large arcs and providing alternate paths around large sets of optional elements are unwieldy and visually cluttering.
- The symbol vocabulary is too limited, which leads to the above and other problems.
Compare the railroad diagram above with the diagram of the same syntax using my method, below. The trade of numerous and looping lines for more symbols with less clutter should be obvious.
Figure 2. An example of my method (SQLite CREATE TABLE).
Here are a few attributes of my method that I see as an improvement:
- Optional elements and groups are additive to the main sequence. This has two advantages:
- The non-optional path doesn't require a line parallel to the entire optional sequence, as in railroad diagrams, making it more space efficient.
- There's no need for a continuous directed graph line connecting each optional choice at both beginning and end with the main sequence--they can "hang" off of it, making it more efficient, flexible, and less cluttered.
- A vertical orientation provides for large size without a need for graph lines "carriage-returning" back to the left of the screen (as in railroad diagrams), which is inefficient, cluttered, and can destabilize visual tracking by the reader. (Actually, any orientation can be used.)
- My design uses familiar flowcharting symbols, as do railroad diagrams, but of a larger yet limited number, to increase flexibility while being more efficient and still easily navigable visually.
In addition, my method adds representational capability that railroad diagrams don't have: it affords a good level of code-repetition limit indication in looped constructs, including for choices in sets. (This may have limited value, admittedly.)
Because it uses more graphical symbols and is more flexible, it naturally has a more complicated set of rules. However, I believe the rules are intuitive when interpreted visually and should be easily mastered. I'm sure most of them are obvious to the reader just by comparing the 2 diagrams above.
I didn't set out to create a diagramming method that matches EBNF one-to-one meta-syntactically. (EBNF, Enhanced Backus-Naur Form, is a text-based system for representing grammars, such as those for programming languages.) What did result, though, has a superset of features over railroad diagrams via a richer visual functionality.
Figure 3. Another example of my method (SQL Server SELECT). Density is higher, as a whole, but this isn't a problem for visual navigation.
My method may not fulfill the needs of all programming languages. For example, it may fail if it supplies extra features while omitting crucial others for a grammar. Computer scientists may thus be displeased if there's some formality I've overlooked. But it will perform anywhere railroad diagrams do.
This is a minor development, obviously. EBNF, a long-lived, compact representational standard, is well-suited to formal language declarations and parsing for compiler development. The text-based syntax method used in many programming books is also compact and suited to a typographical preference. But syntax diagrams benefit communication and education, and have their place. I believe my method now provides an improved alternative to railroad diagrams because it's more efficient, more powerful, and (at least with larger diagrams) more visually navigable. More can be learned about it here.
About the Author
Thomas Knight has worked in IT intermittently since the 1990s and has more than 12 years of experience as a DBA and former developer. He has most recently worked on practical projects at the nexus of theory, ideal and method, such as the SDM discussed here.
1. ["The Programming Language Pascal" by Niklaus Wirth, 1973, and the "Burroughs CANDE Language Manual", 1972.] ?