Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: VB.NET
Expertise: Intermediate
May 18, 2004

Suffering from Spam?

One of the major ways that spammers get email addresses is by crawling Web pages and extracting any email addresses they find. As part of your own Web site development, it's a good idea to check your own Web pages to make sure you didn't accidentally include any email addresses. Here's a little VB.NET utility that can scan a Web page and extract most email addresses.

Private Sub cmdScan_Click(ByVal sender As System.Object, _
   ByVal e As System.EventArgs) Handles cmdScan.Click
   ScanAPage(txtURL.Text)
End Sub
The Regex Regular Expression class does the scanning. This particular pattern breaks down as follows:

\w+ One or more characters, numbers or underscores
@ The @ symbol
\w+ One or more characters, numbers or underscores
( A group consisting of.
\. A period followed by
\w+ One or more characters, numbers or underscores
) End of the group
Private m_EmailRE As New Regex("\w+@(\w+)(\.\w+)+")
Now, this expression is not perfect—it will detect some invalid email addresses, but keep in mind what it is designed for. This application isn't a tool for spammers to extract email addresses—it's a tool to help you spot email addresses you've inadvertently left in your code. So false positives aren't harmful. The AddToList function checks for duplicates before listing the email addresses:

Private Sub AddToList(ByVal s As String)
   If Not ListBox1.Items.Contains(s.ToLower) Then
      ListBox1.Items.Add(s.ToLower)
   End If
End Sub
The ScanAPage function does the real work. It uses the WebRequest object to retrieve the specified Web page, then applies the Regular Expression to extract all patterns that match the email address, and adds them to the list:

Private Sub ScanAPage(ByVal url As String)
   Dim req As WebRequest
   req = WebRequest.Create(url)

   Dim pageinfo As String
   Dim response As WebResponse
   Dim thisuri As New Uri(url)
   Try
      response = req.GetResponse()
   Catch ex As Exception
      Exit Sub
   End Try
   Dim sr As New IO.StreamReader(response.GetResponseStream)
   pageinfo = sr.ReadToEnd()
   response.Close()

   Dim reresults As MatchCollection
   Dim onematch As Match

   ' Get Email addresses and add to list
   reresults = m_EmailRE.Matches(pageinfo)
   If Not reresults Is Nothing Then
      For Each onematch In reresults
         AddToList(onematch.Value)
      Next
   End If

End Sub

Dan Appleman is the president of Desaware, Inc. and the author of a number of programming books.

Dan Appleman
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap