Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Getting Started with Regular Expressions : Page 5

Regular expressions, also referred to as "regex" in the developer community, are extremely powerful pattern matching and substitution tools. This article introduces you to regular expressions, what they are, why you would want to use them, and finally, how you can begin putting them to work in Visual Studio .NET.


advertisement

The System.Text.RegularExpressions Namespace
In addition to the ASP.NET RegularExpressionValidator control, you can also take advantage of the classes contained in the .NET Framework regular expression engine. These classes are contained in the System.Text.RegularExpressions namespace.

The RegEx Class
The RegEx class handles the majority of the work in the System.Text.RegularExpressions namespace. The constructor of this class is critical because it contains the most important element of a regular expression, the pattern. You can code the constructor in one of three ways.

   'Passing no parameters
   RegEx()
   'Passing the string pattern
   RegEx(pattern)
   'Passing the string pattern and option settings
   RegEx(pattern,options)
The pattern parameter, if passed, needs to be a string. The options parameter, if passed, needs to be a member of the RegExOptions enumeration.

The RegExOptions enumeration contains the options that you can set when you create a RegEx object. IgnoreCase is a commonly used option that overrides RegEx's default case-sensitivity behavior. Include this option if you want to have a case insensitive regular expression. Another way to specify case insensitivity is to add (?i) to the beginning of the pattern.

   ^(?i)[a-z]{3}$
This expression will match abc, AbC, and ABC.

Since matching is one of the most commonly performed operations, let's start with it. The code below determines if you've entered a valid U.S. phone number into a textbox.



          Dim oRegEx As Regex = New _
       Regex("((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}")
           Dim x As Boolean
           x = oRegEx.IsMatch(Me.TextBox1.Text)
           If x Then
               MessageBox.Show("Valid!")
           Else
               MessageBox.Show("Invalid!!!")
           End If
The IsMatch()method returns true if it finds the pattern in the passed string, false otherwise. The static version of IsMatch accepts three parameters: the string passed in, the pattern to check it against, and the RegExOptions required.

           If Regex.IsMatch(Me.TextBox2.Text, _
            "[a-z]{3}", RegexOptions.IgnoreCase) Then
Match Object
The RegEx.Match method returns a Match object that provides detailed information about a match, including whether or not a match was found, the value (the text matched), the index (position in the searched string), and length of the string matching the pattern.

         Dim SearchString As String _
            = "A cobra is a venomous snake!"
         Dim Pattern As String = "\bve\w*"
         Dim oMatch As Match
           oMatch = Regex.Match(SearchString, _
              Pattern, RegexOptions.IgnoreCase)
           If oMatch.Success Then
              MessageBox.Show(oMatch.Value)
              MessageBox.Show(oMatch.Index)
              MessageBox.Show(oMatch.Length)
           End If
The pattern in the above code will match any word (note the \b metacharacter) that begins with "ve". It matches on the word venomous, therefore oMatch.Success return true. oMatch.Value contains "venomous," oMatch.Index contains 13, and oMatch.Length contains 8.

Either RegEx.IsMatch() or Match.Success will work if you're only looking for a single match. What if you want to find all of the occurrences in a string that match the pattern? This is where NextMatch() comes in.

NextMatch() will return the next match in the searched string starting from the end of the current match. You can place NextMatch() inside a loop to iterate through a searched string to find all the occurrences of the pattern.

     Dim SearchString As String = _
       "A cobra is a very, very venomous snake!"
    Dim Pattern As String = "\bve\w*"
    Dim oMatch As Match
    Dim MatchHits As Integer = 0
    oMatch = Regex.Match(SearchString, _
       Pattern, RegexOptions.IgnoreCase)
     Do While oMatch.Success
        MatchHits = MatchHits + 1
        oMatch = oMatch.NextMatch()
          If oMatch.Success Then
             MessageBox.Show(oMatch.Value)
             MessageBox.Show(oMatch.Index)
             MessageBox.Show(oMatch.Length)
          End If
    Loop
The previous code will find three matches to the pattern, starting with the first occurrence of the word "very." A loop then begins based on the success of the first match. The NextMatch() method is called within the loop to find the next pattern match.

While this technique may work for you in some circumstances, it has limitations. The technique doesn't provide a way to index any match or provide any metadata about the match set, such as a match count. In a previous example the code maintained a MatchHits variable manually.

MatchCollection
If you need the ability to iterate through the match set, if you need to know how many matches occurred, or if you need to be able to do ad-hoc indexing into the match set, then MatchCollection is the object for you.

    Dim SearchString As String = _
      "A cobra is a very, very venomous snake!"
    Dim oMatch As Match
    Dim oMatchCollection As MatchCollection
    Dim oRegEx As New Regex("\bve\w*")
   oMatchCollection = oRegEx.Matches(SearchString)
   MessageBox.Show(oMatchCollection.Count)
     For Each oMatch In oMatchCollection
       MessageBox.Show(oMatch.Value)
       MessageBox.Show(oMatch.Index)
       MessageBox.Show(oMatch.Length)
     Next
The above code does almost exactly the same thing as the preceding code sample that used NextMatch(). The difference is that this code uses the Matches method to create a MatchCollection object.

Replacement Strings
Replacement does exactly what you expect it to do?find a piece of text that matches a pattern and replace it with another.

   Dim SearchString As String = _
     "A cobra is a very, very venomous snake!"
   Dim Pattern As String = "\bven\w*"
   Dim ReplacementString As String = "friendly"
   Dim NewString As String
   NewString = _
    Regex.Replace(SearchString, Pattern, _
   ReplacementString)
The above code will find all occurrences of the word "venomous" and replace it with the word "friendly."

The replacement capabilities of regular expressions are limitless. You could read an HTML file and remove all of the bold tags ( and ) with a simple Regex.Replace() call.

   Dim SearchString As String = _
     "A cobra is a very, <B>very</B> venomous snake!"
   Dim Pattern As String = "(<b>)|(</b>)"
   Dim ReplacementString As String = ""
   Dim NewString As String
   NewString = _
   Regex.Replace(SearchString, Pattern, _
   ReplacementString, RegexOptions.IgnoreCase)
This code results in NewString containing "A cobra is a very, very friendly snake!"

   (<b>)|(</b>)|(<i>)|(</i>)
This pattern expands on the previous example to remove italics tags as well.

My goal at the beginning of this article was to introduce you to the world of regular expressions and how to use them in Visual Studio .NET. While they can look complex and overwhelming to the uninformed, once you break them apart they're really not that bad to work with and they provide a powerful tool to add to your development toolbox.

Resources
You will find plenty of help and resources on regular expressions on the Web. Here are just a few of the resources you'll find:

Software
RegExDesigner (freeware)
http://www.sellsbrothers.com/tools/

RegexDesigner.NET is a powerful visual tool for helping you construct and test .NET Regular Expressions. When you are happy with your regular expression, RegexDesigner.NET lets you integrate it into your application through native C# or VB .NET code generation and compiled assemblies (usable from any .NET language).

http://www.regexlib.com is a Web site with an extensive collection of contributed regular expression patterns. You can submit your patterns and/or test your patterns before you implement them with the Regular Expression Tester.

Books
Mastering Regular Expressions, Second Edition, Jeffrey Friedl

Regular Expressions with .NET (ebook), by Dan Appleman

Visual Basic .NET Text Manipulation Handbook: String Handling and Regular Expressions, By Paul Wilton, Craig McQueen, FranC'ois Liger



Jim Duffy is founder and president of TakeNote Technologies, an award-winning training and software development company. He has a BS degree in Computer and Information Systems and over 18 years of programming and training experience. He is an energetic trainer, skilled developer, and has been published in leading developer-oriented publications. Jim, a recent Microsoft MVP award recipient, is a popular speaker at regional user groups and developer conferences. Additional information about Jim and TakeNote Technologies can be found at www.takenote.com.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date