hile writing an earlier article (see “Build a Touch Utility with .NET”), which describes a console application to set file dates and times, I struggled with the problem of parsing and validating the command line arguments. I ended up writing some custom code for the article, even though it was obvious that parsing command lines was a generic operation. I promised to address the problem generically in a future article, so that’s what this article does: It describes a set of classes to parse and validate console application command lines.
Investigating Command Lines
The “command line” is the portion of the line you write in a console application that follows the name of the executable application. For example, in the Touch application described in the earlier article, you could specify a single file, a folder, and a set of arguments for setting various dates, so the command line might look like this:
touch c: emp*.doc 01/02/2002 12:00:00 AM -w -s -a
The command line consists of the text that follows the name of the executable (“touch” in this case) and specifies that the Touch utility should act on files in the c: emp folder that have a .doc extension as well as all its subfolders (-s), setting the files’ LastWriteDate (-w), and LastAccessDate (-a) to the date/time 01/02/2002 12:00:00 AM.
As you begin to look more closely at command lines, you’ll find that they consist of one or more parts, often (but not always) separated by spaces. These “parts” are called command line parameters. Some parameters are commonly called flags or options. I’ll use the term flags in this article to clearly differentiate flag-type entries from optional entries?parameters that the user may or may not enter. Flags are usually short parameters, one or two characters long, prefaced with a hyphen or a forward slash. Not all flags are short though; for example, one flag for the Windows ipconfig command accepts the argument /flushdns. Although application documentation usually lists flags individually, it’s a common convention that users can enter simple flags sequentially after a single hyphen or slash. For example, you could write the command line above with the -w, -s, and -a options following a single hyphen:
c: emp*.doc 01/02/2002 12:00:00 AM -wsa
To complicate matters, you’ll often find flag parameters that take associated values; in other words, the parameter actually consists of two related values, usually a flag followed by a string. For example, many applications that process data let you specify an output file with a -o flag followed by the output file name:.
-o c: empsomefile.txt
It’s fairly easy to parse command lines where the parameters are all flags, or are all required, and where the program forces the user to enter command line arguments in a specific order. Matching known flags in specific sequences is simplistic, as is the process of matching known parameters that must appear in a particular position. Such programs put the burden on the user to conform to the program. However, less draconian applications tend to work the other way around?they let users enter command line parameters in any reasonable order?simplifying input at the cost of more parsing logic.
As applications grow more complex, they tend to accumulate parameters. The more complex the program the more likely it is to have a large number of command-line options, and parsing process becomes commensurately more difficult.
For example, a utility to copy files might let you enter the source and destination filenames in either order by using an option flag to identify which is which. In the following example, the -s identifies the source file while the -d identifies the destination. The program accepts command line parameters that specify both, either, or none of the flags and adjusts its action accordingly. When the command line contains no flags, it assumes that the first file name is the source file and the second is the destination.
Examples: ' both flags are present -s c:somefile.txt -d c:myfilessomefile.txt ' only the destination flag specified. ' the program assumes that the second ' file name is the source file. -d c:myfilessomefile.txt c:somefile.txt ' assume source then destination c:somefile.txt c:myfilessomefile.txt
In VB.NET, you can obtain the command line passed to VB.NET via the Command() function. Many developers immediately assume that you can parse a command line by simply splitting the command line wherever spaces occur using the String.Split method. That works fine for dates and times because the spaces delimit the date, time, and AM/PM specifier. But you run into problems when you need to parse long filenames or other command parameters that might already contain spaces, and for concatenated flags, which aren’t delimited by spaces. Instead, a command-line parser must be able to recognize quoted strings?including any embedded spaces?as a single parameter, and be able to recognize flags even when they’re not entered separately:
' assume source then destination ' note the spaces in the quoted file names "c:some name.txt" "c:myfilessome name.txt"
For some console applications, you may not know in advance exactly what information the user will enter, but you can enforce rules of nearly any complexity by using regular expressions.
Generic Parsing Guidelines
Now that you know the types of parameters that command lines contain, you can write out some generic guidelines for parsing command lines:
- Command lines consist of zero or more entries separated by spaces. Entries with embedded spaces appear between double quotes.
- Some entries need to be strongly typed?converted to dates or times, or treated as filenames?before validation.
- Some entries are flags or options. These always begin with a hyphen or slash, but can be combined behind one hyphen, without spaces. Flags may consist of multiple characters, such as -flag.
- Some entries are free-form-text but must match a pattern, such as a file specification.
- Some entries are required, and some are optional. Some must follow or precede specific types of entries, such as a flag/value combination. Some entries must appear in a specific position; for others, the position doesn’t matter.
Building on these guidelines, a command-line parser must be able to:
- Split command lines into their component parts and recognize quoted strings.
- Differentiate flag entries from other entries and recognize flags even when they aren’t delimited by white space.
- Enforce position, both absolute (entry must appear at a specific index) and relative to some other entry (for example, an entry must follow “-f” or must appear before or after a date entry)
- Validate entered (and missing) parameters by checking that all required parameters exist, that entries with specific positional requirements are in the correct positions, that they follow or precede other entries as specified, and that each entry matches its specified type (date, filename), pattern (regular expression, file specification), or value.
Because users often make mistakes, a generic parser should also let developers handle errors. Mistakes may consist of: missing data, where the user did not enter a required value, invalid data, for example, an invalid file path or a malformed date, or extra data, such as unrecognized flag values. Developers can choose to ignore extra values in otherwise valid command lines or treat them as errors. The parser should return information for all three types of mistakes.
Finally, the parser should not restrict developers?it should be flexible enough to perform only required tasks?such as simply split the command line into tokens and return those, unaltered, so developers can apply custom validation rules.
How the Command-Line Parser Works
The sample parser in this article meets these guidelines. It consists of several classes in a CommandLineParse namespace. The CommandLineParser class (see Listing 1) controls the sequence of actions involved in parsing a command line and serves as a repository for both matched and unmatched parameters and for error messages. To set up the CommandLineParser, you populate its CommandLineEntryCollection (see Listing 2) with CommandLineEntry objects (see Listing 3) by calling the parser’s CreateEntry method (see Listing 1). You must pass a CommandTypeEnum value to the function that specifies the type of data, and optionally, the value expected for each CommandTypeEntry?a value from the CommandTypeEnum enumeration, shown below:
Public Enum CommandTypeEnum ' a file specification, such as "*.txt" Filespec = 1 ' a short date string, e.g. 08/12/2002 ShortDateString = 2 ' a long date string, e.g. 08/12/2002 12:00:01 AM LongDateString = 3 ' any string value Value = 4 ' text validated with a regular expression RegExpression = 5 ' a value treated as an single or ' multiple character option that must ' be preceded by "/" or "-" Flag = 6 ' a file that must already exist ExistingFile = 7 End Enum
These seven types serve to make the CommandLineParser both useful and flexible. It’s useful because it can recognize and validate common input parameter types, such as files and dates, thus eliminating most common command-line-parsing code. It’s flexible because you can use the Value type for any arbitrary value, or the RegExpression type to validate complex entries.
To perform the parse, the CommandLineParser creates a Tokenizer class that splits the command line into its component parts. The Tokenizer returns a Tokens collection containing the individual parameters. The parser then passes the Tokens collection and its CommandLineEntryCollection to a TokenAssigner, which tries to assign each individual token to a matching CommandLineEntry object by looping each object’s Value property through the collection setting.
Setting the Value property causes the CommandLineEntry object to perform a first-level validation of the tokens by checking the type and settings for that particular CommandLineEntry object against the characteristics of the token. The CommandLineEntry objects reject tokens if they don’t match the settings for that particular CommandLineEntry.
The TokenAssigner returns an UnmatchedTokensCollection object that contains all the command line parameters for which the TokenAssigner could not find a matching CommandLineEntry object. The TokenAssigner can return unmatched tokens even after a successful parse (see Figure 1).
|Figure 1: Overview of the Parse operation. The calling code creates and populates a CommandLineParser object and then passes it a command line string. The parser uses a Tokenizer class to split the command line into tokens and a TokenAssigner class to match the CommandLineEntry objects in its Entries collection with the tokens. The calling code has access to any unmatched tokens, the Entries collection, and a list of error messages generated during the parse operation.
|Author’s Note: Don’t confuse unmatched tokens with unmatched CommandLineEntry items. A successful parse does not necessarily populate every defined CommandLineEntry, because some entries may have their property set to False, and the command line may not contain a matching token for those entries.
If the TokenAssigner completes without errors, the Parser calls a
As implemented in the sample code, the parser ignores extra parameters entered by the user, but you can access them through the UnmatchedEntries property, and treat them appropriately for your application.
Setting Up the Sample Parser
The sample project CommandLineParserTest included with the downloadable code shows how to use the parser. First, add a reference to the CommandLineParse namespace to your test application and then create a CommandLineParser instance:
' At the top of the file Imports CommandLineParse Dim parser As CommandLineParser parser = New CommandLineParser(Command())
Next, populate the parser’s CommandLineItemCollection by creating CommandLineItems and setting their properties. For example, the following method creates two CommandLineItems that specify a flag type entry (-f) followed by an ExistingFile type entry. For example, the command line
Sub SetupCommandLineEntries(ByVal parser _ As CommandLineParser) Dim anEntry As CommandLineEntry parser.Errors.Clear() parser.Entries.Clear() ' create a flag type entry that accepts a -f (file) ' flag, (meaning the next parameter is a file ' name), and is required anEntry = parser.CreateEntry _ (CommandLineParse.CommandTypeEnum.Flag, "f") anEntry.Required = True parser.Entries.Add(anEntry) ' store the new Entry in a local reference ' for use with the next CommandLineEntry's ' MustFollow property. Dim fileEntry As CommandLineEntry fileEntry = anEntry ' now create am ExistingFile type entry that must ' follow the -f flag. anEntry = parser.CreateEntry _ (CommandTypeEnum.ExistingFile) anEntry.MustFollowEntry = fileEntry anEntry.Required = True parser.Entries.Add(anEntry) End Sub
In this scenario, both entries are required and the ExistingFile entry must follow the flag entry, regardless of how many other parameters you may later create that precede or follow these two. After setting up the CommandLineItemCollection, call the CommandLineParser.Parse method to initiate the parse and validation (see Listing 1).
The Parse method returns a Boolean value with the overall result of the parse operation.
If parser.Parse() Then Console.WriteLine("Successful parse") Console.WriteLine("") Else Console.WriteLine("Parse failed") For Each sErr In parser.Errors Console.WriteLine("Reason: " & sErr) Next Console.WriteLine("") End If
Figure 2 shows the sample CommandLineParserTest application results after a successful request using the CommandLineItems described above and the command line -f c: empjunk.txt.
|Figure 2: Application results after a successful request using the command line -f c: empjunk.txt.
In contrast, if you give the parser an invalid command line, such as -f c:BadFile.txt, where the file doesn’t exist (see Figure 3), or any other invalid parameters, such as -x c: empjunk.txt (see Figure 4), the Parse method returns False, and you can see the errors that accumulated during the parse operation as well as any unmatched parameters in the sample application’s display.
To use the parser, add a reference to the CommandLineParse namespace to your project, create a CommandLineParser instance, and then create CommandLineEntry instances for each possible entry type you want users to be able to enter on the command line. For each entry, you must minimally specify the CommandTypeEnum and pass a reference to the parser.
Extending the Sample Parser
You could alter or extend the sample parser to in several ways. For example, you could define individual CommandLineEntry subclasses for each type and eliminate the long Case structure in the ValidateValue method. You could create Exception classes inherited from ApplicationException to simplify the process of checking the errors. That would also make it easier to remove the error messages from the code and put them in localizable resource files. You could add a Number type that would convert the string entries to a designated numeric type and format, using a Min and Max property to verify that the entry lies within a specific range. The Min and Max properties would be useful for date types as well.
Finally, as implemented, the parser doesn’t fail immediately when it encounters a condition that causes an overall parse failure; instead, it simply adds error messages to the Errors collection. While that causes the parser to be slower when an error occurs early in the command line, it also gives developers the greatest possible amount of information about what the parser is doing. In addition, the sample code is not highly optimized?you can probably find numerous ways to make it faster.