WEBINAR:
On-Demand
Application Security Testing: An Integral Part of DevOps
Regular Expressions
Using a regular expression to find a key value at the beginning of a line followed by "=" is easy, as far as regular expressions go. If you want to learn more about regular expressions, start at this help topic: ms-help://MS.NETFrameworkSDKv1.1/cpguidenf/ html/cpconcomregularexpressions.htm; that's where I started. I crafted the following search expression to find the "URL =" text in the INI file, and to allow me to retrieve the text following the "=" on the line:
^URL\s*=\s*(?<1>.*)
Broken down into its parts, this expression tells my code to look for a string with the following characteristics:
- Beginning of a line
- The text URL
- Zero or more white space characters
- =
- Zero or more white space characters
- Zero or more characters of any kind, captured into a group that you can retrieve later, in code.
Listing 2 shows the revised
GetIniValue procedure, this time using regular expressions to do its work. Obviously, explaining the code in detail is beyond the scope of what I can do here, but in "big strokes," the code takes the following actions, which assume that you've added an
Imports statement for the System.Text.RegularExpressions namespace:
- Use the String.Format method to create the correct search string:
' Build up the match string, replacing {0} with
' the key name to find.
Dim strMatch As String = _
String.Format("^{0}\s*=\s*(?<1>.*)", keyName)
- Create a new regular expression, using the search string and appropriate options:
Dim regex As New Regex(strMatch, _
RegexOptions.Multiline Or
RegexOptions.IgnoreCase)
' Get match, if it's in there.
Dim match As Match = regex.Match(iniText)
- Retrieve the group containing the results, and return the value:
Dim result As String
If Not match Is Nothing Then
result = match.Groups(1).Value.Trim()
End If
Return result
Doesn't this code just ooze elegance? Could it get much cleaner than that? It has only one conditional statement and really not much code at all. I really thought I had the best solution.
Only one problem: Brian had originally "pooh-poohed" regular expressions, figuring they'd be slower. Guess what? They are. In some simple tests, running both versions a large number of times, the regular expression version takes around three times as long to run as the original version, on average. You could mitigate much of the overhead by caching the RegEx object in memory, rather than creating a new one each time (you would want to do this if you were performing this operation often), and you can get some speed improvement by compiling the regular expression. Even with these changes, the regular expression version took longer to execute.
What have I learned through this exercise? Clearly, the most elegant code doesn't always win. Sometimes, brute force is a better choice, both in terms of readability and in terms of performance. One could, I suppose, draw a parallel between the house and the code: the original version of the house was a lot easier to live in, just as the original version of the code performs better. I'm hoping the spiffed-up house helps us sell faster, but spiffing up the code, in this case, didn't help anyone very much.