egular expressions. The name doesn't conjure up any grandiose ideas about what they are all about. How could it with the word "regular" in the title? For those of you who struggled to learn how to use them, you're probably thinking they should be renamed irregular expressions. How could something that looks like this...
...be called regular??? (That pattern, by the way, will validate nearly 99% of all e-mail addresses, but more about that later.)
I've broken this article down into three sections.
- Describe regular expressions
- Analyze some common patterns
- Implement common expressions in Visual Studio .NET.
At the end of the article I've listed resources including Web sites, books, and regular expression software.
What Are Regular Expressions?
A regular expression allows you to efficiently search for strings within other strings using a pattern matching expression. You've probably used a regular expression before and didn't even know it. For example, have you ever used *.txt
to search for or filter a list of files? If so, congratulations! You're a regular expression user!
Regular expressions play a key role in all kinds of text-manipulation tasks. Common uses include searching (matching) and search-and-replace. You can also use common expressions to test for specific conditions in a string, text file, Web page, or XML stream. You could use regular expressions as the basis for a program that filters spam from incoming mail. In this type of situation, you might use a regular expression to determine whether the "From:" e-mail address is the e-mail address of a known spammer. As a matter of fact, many e-mail filtering programs use regular expressions for exactly this reason.
One of the drawbacks to using a regex is that they tend to be easier to write than they are to read. Here's an example of a very common regex pattern.
If this pattern looks overwhelming right now, don't worry, you'll know what it means by the time you finish reading this article.
You construct regex patterns using a series of different characters called metacharacters. Table 1
lists the most common metacharacters.
Table 1: Metacharacter listing
Matches any single character except the newline character.
Matches any one of the enclosed characters. You can specify a range using a hyphen, such as [0-9].
Matches either x or y.
Matches beginning of string.
Matches end of string.
Matches 0 or 1 instances of the preceding character.
Matches 1 or more instances of preceding character.
Indicates that the next character should not be interpreted as a regular expression special character.
Matches 0 or more instances of preceding character.
Groups a series of characters.
Matches exactly n instances of preceding character.
Matches at least n instances of preceding character (where n is an integer).
Matches at least n and at most m instances of preceding character (where n and m are integers).
Matches a single white space character, including space, tab, form feed, and line feed. (Same as [\f\n\r\t\v])
Matches any alphanumeric character, including the underscore (same as [A-Za-z0-9_]).
Matches a digit character (same as [0-9]).
Matches a form feed.
Matches a line feed.
Matches a carriage return.
Matches a tab.
Matches a vertical tab.
Matches a word boundary, such as a space.