Building an IF Function Parser
The code to parse an IF
formula consists of a .NET class, callable by VBA in Excel. The heart of the parser is the .NET framework's Regular Expression class, used to identify each IF
function. Syntactically, an IF function in Excel always takes this form:
It's worth pointing out that any of the items A
, or Z
in the preceding formula could be another IF
function or indeed any
built in or user-defined function in Excel. The key to isolating IF
statements is to identify where each function begins and ends. The opening and closing parentheses provide an indication; however it's not quite as simple as that. First, you need to identify the opening or closing parentheses that signify either the beginning or end of an IF
function. But remember, parentheses can also occur in literal strings. Moreover, other functions can contain parentheses, so the parser must unambiguously identify the specific parentheses belonging to the IF
Using regular expressions simplifies the problem considerably. The parser uses the .NET framework class because Excel VBA doesn't include a robust regular expression parser. For example, the code below captures all the IF
functions in a formula:
string patternIF= "IF\\(";
Regex reg = new Regex(
MatchCollection matches = reg.Matches(sourceCopy);
Because parentheses are crucial, the code needs to handle braces that are part of other Excel formulas. To identify those formulas that do not take any parameters (such as Date()), you can use the code below:
string pattern = @"[a-zA-Z\0-9]*\(\)";
Regex reg = new Regex(
MatchCollection matches = reg.Matches(ifTest);
The parser must also identify all the literal strings in a formula, using this pattern:
string localPattern = @"\x22[\sa-zA-Z\(\)]*\x22";
Regex regLocal = new Regex(
So the logic follows this procedure:
- A user selects a formula
- The parser identifies all the IF functions the formula contains, and then figures out which of those is the innermost IF function.
- It replaces any literal strings and all built-in or user-defined functions in the formula with tokens, using regular expression pattern matching.
- Because IF functions can be nested, the parser also needs to work out the relationship between the various IF functions to indent the pseudo code properly.
- Finally, the parser breaks down each IF statement into three parts, the IF part, the Then part and the Else part. As it reconstructs the formula from the tokens, the display code ensures that each part of the "IF" function appears on a separate line, and adds indentation based on where the IF function lies in the whole formula.