Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Getting Started with Regular Expressions

Regular expressions, also referred to as "regex" in the developer community, are extremely powerful pattern matching and substitution tools. This article introduces you to regular expressions, what they are, why you would want to use them, and finally, how you can begin putting them to work in Visual Studio .NET.


advertisement

egular expressions. The name doesn't conjure up any grandiose ideas about what they are all about. How could it with the word "regular" in the title? For those of you who struggled to learn how to use them, you're probably thinking they should be renamed irregular expressions. How could something that looks like this...

   
^[\w-]+(?:\.[\w-]+)*@(?:[\w-]+\.) _
+[a-zA-Z]{2,7}$ 

...be called regular??? (That pattern, by the way, will validate nearly 99% of all e-mail addresses, but more about that later.)

I've broken this article down into three sections.

  • Describe regular expressions
  • Analyze some common patterns
  • Implement common expressions in Visual Studio .NET.
At the end of the article I've listed resources including Web sites, books, and regular expression software.

What Are Regular Expressions?
A regular expression allows you to efficiently search for strings within other strings using a pattern matching expression. You've probably used a regular expression before and didn't even know it. For example, have you ever used *.txt to search for or filter a list of files? If so, congratulations! You're a regular expression user!

Regular expressions play a key role in all kinds of text-manipulation tasks. Common uses include searching (matching) and search-and-replace. You can also use common expressions to test for specific conditions in a string, text file, Web page, or XML stream. You could use regular expressions as the basis for a program that filters spam from incoming mail. In this type of situation, you might use a regular expression to determine whether the "From:" e-mail address is the e-mail address of a known spammer. As a matter of fact, many e-mail filtering programs use regular expressions for exactly this reason.

One of the drawbacks to using a regex is that they tend to be easier to write than they are to read. Here's an example of a very common regex pattern.

   ((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}

If this pattern looks overwhelming right now, don't worry, you'll know what it means by the time you finish reading this article.

You construct regex patterns using a series of different characters called metacharacters. Table 1 lists the most common metacharacters.

Table 1: Metacharacter listing


Metacharacter

Meaning

Character Matching

 

.

Matches any single character except the newline character.

[ ]

Matches any one of the enclosed characters. You can specify a range using a hyphen, such as [0-9].

x|y

Matches either x or y.

Position Matching

 

^

Matches beginning of string.

$

Matches end of string.

Repetition Matching

 

?

Matches 0 or 1 instances of the preceding character.

+

Matches 1 or more instances of preceding character.

\

Indicates that the next character should not be interpreted as a regular expression special character.

*

Matches 0 or more instances of preceding character.

( )

Groups a series of characters.

{n}

Matches exactly n instances of preceding character.

{n,}

Matches at least n instances of preceding character (where n is an integer).

{n,m}

Matches at least n and at most m instances of preceding character (where n and m are integers).

Special Characters

 

\s

Matches a single white space character, including space, tab, form feed, and line feed. (Same as [\f\n\r\t\v])

\w

Matches any alphanumeric character, including the underscore (same as [A-Za-z0-9_]).

\d

Matches a digit character (same as [0-9]).

\f

Matches a form feed.

\n

Matches a line feed.

\r

Matches a carriage return.

\t

Matches a tab.

\v

Matches a vertical tab.

\b

Matches a word boundary, such as a space.

 

 




Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap