hile writing an earlier article (see "Build a Touch Utility with .NET"
), which describes a console application to set file dates and times, I struggled with the problem of parsing and validating the command line arguments. I ended up writing some custom code for the article, even though it was obvious that parsing command lines was a generic operation. I promised to address the problem generically in a future article, so that's what this article does: It describes a set of classes to parse and validate console application command lines.
Investigating Command Lines
The "command line" is the portion of the line you write in a console application that follows the name of the executable application. For example, in the Touch application described in the earlier article, you could specify a single file, a folder, and a set of arguments for setting various dates, so the command line might look like this:
touch c:\temp\*.doc 01/02/2002 12:00:00 AM -w -s -a
The command line consists of the text that follows the name of the executable ("touch" in this case) and specifies that the Touch utility should act on files in the c:\temp
folder that have a .doc
extension as well as all its subfolders (-s
), setting the files' LastWriteDate (-w
), and LastAccessDate (-a
) to the date/time 01/02/2002 12:00:00 AM
As you begin to look more closely at command lines, you'll find that they consist of one or more parts, often (but not always) separated by spaces. These "parts" are called command line parameters
. Some parameters are commonly called flags
I'll use the term flags in this article to clearly differentiate flag-type entries from optional entriesparameters that the user may or may not enter. Flags are usually short parameters, one or two characters long, prefaced with a hyphen or a forward slash. Not all flags are short though; for example, one flag for the Windows ipconfig
command accepts the argument /flushdns
. Although application documentation usually lists flags individually, it's a common convention that users can enter simple flags sequentially after a single hyphen or slash. For example, you could write the command line above with the -w
, and -a
options following a single hyphen:
c:\temp\*.doc 01/02/2002 12:00:00 AM -wsa
To complicate matters, you'll often find flag parameters that take associated values; in other words, the parameter actually consists of two
related values, usually a flag followed by a string. For example, many applications that process data let you specify an output file with a -o
flag followed by the output file name:.
It's fairly easy to parse command lines where the parameters are all flags, or are all required, and where the program forces the user to enter command line arguments in a specific order. Matching known flags in specific sequences is simplistic, as is the process of matching known parameters that must
appear in a particular position. Such programs put the burden on the user to conform to the program. However, less draconian applications tend to work the other way aroundthey let users enter command line parameters in any reasonable ordersimplifying input at the cost of more parsing logic.
As applications grow more complex, they tend to accumulate parameters. The more complex the program the more likely it is to have a large number of command-line options, and parsing process becomes commensurately more difficult.
For example, a utility to copy files might let you enter the source and destination filenames in either order by using an option flag to identify which is which. In the following example, the -s identifies the source file while the -d identifies the destination. The program accepts command line parameters that specify both, either, or none of the flags and adjusts its action accordingly. When the command line contains no flags, it assumes that the first file name is the source file and the second is the destination.
' both flags are present
-s c:\somefile.txt -d c:\myfiles\somefile.txt
' only the destination flag specified.
' the program assumes that the second
' file name is the source file.
-d c:\myfiles\somefile.txt c:\somefile.txt
' assume source then destination
In VB.NET, you can obtain the command line passed to VB.NET via the Command()
function. Many developers immediately assume that you can parse a command line by simply splitting the command line wherever spaces occur using the String.Split
method. That works fine for dates and times because the spaces delimit the date, time, and AM/PM specifier. But you run into problems when you need to parse long filenames or other command parameters that might already contain spaces, and for concatenated flags, which aren't delimited by spaces. Instead, a command-line parser must be able to recognize quoted stringsincluding any embedded spacesas a single parameter, and be able to recognize flags even when they're not entered separately:
' assume source then destination
' note the spaces in the quoted file names
"c:\some name.txt" "c:\myfiles\some name.txt"
For some console applications, you may not know in advance exactly what information the user will enter, but you can enforce rules of nearly any complexity by using regular expressions.
Generic Parsing Guidelines
Now that you know the types of parameters that command lines contain, you can write out some generic guidelines for parsing command lines:
- Command lines consist of zero or more entries separated by spaces. Entries with embedded spaces appear between double quotes.
- Some entries need to be strongly typedconverted to dates or times, or treated as filenamesbefore validation.
- Some entries are flags or options. These always begin with a hyphen or slash, but can be combined behind one hyphen, without spaces. Flags may consist of multiple characters, such as -flag.
- Some entries are free-form-text but must match a pattern, such as a file specification.
- Some entries are required, and some are optional. Some must follow or precede specific types of entries, such as a flag/value combination. Some entries must appear in a specific position; for others, the position doesn't matter.
Building on these guidelines, a command-line parser must be able to:
- Split command lines into their component parts and recognize quoted strings.
- Differentiate flag entries from other entries and recognize flags even when they aren't delimited by white space.
- Enforce position, both absolute (entry must appear at a specific index) and relative to some other entry (for example, an entry must follow "-f" or must appear before or after a date entry)
- Validate entered (and missing) parameters by checking that all required parameters exist, that entries with specific positional requirements are in the correct positions, that they follow or precede other entries as specified, and that each entry matches its specified type (date, filename), pattern (regular expression, file specification), or value.
Because users often make mistakes, a generic parser should also let developers handle errors. Mistakes may consist of: missing data
, where the user did not enter a required value, invalid data
, for example, an invalid file path or a malformed date, or extra data
, such as unrecognized flag values. Developers can choose to ignore extra values in otherwise valid command lines or treat them as errors. The parser should return information for all three types of mistakes.
Finally, the parser should not restrict developersit should be flexible enough to perform only required taskssuch as simply split the command line into tokens and return those, unaltered, so developers can apply custom validation rules.