Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: Web Development
Expertise: Beginner
Sep 30, 1998

Capturing Search Engine Results

Question:
I would like to have a page that allows a user to conduct a search. I would like the action of the search to be an ASP page that in turn queries yahoo.com, altavista.digital.com, and excite.com, and returns the results back to a string variable in my ASP page.

I would then like to parse through this string, pulling out the relevant links from each search engine and create a new HTML stream with the links from all of the search engines, indicating the original source after the link.

What is the best way to capture content from another site for parsing and reformatting?

Thanks!

Answer:
The Microsoft Internet Transfer control makes it a snap to capture HTML from any Web site. The control has a number of properties and methods that are useful, but one of the most important is the OpenURL method. Using a single call to this method, you can retrieve HTML into a variable. You can then parse this variable in any way that you like.

The following excerpt is from a Visual Basic application that uses an Internet Transfer Control named Inet1. This procedure will call the OpenURL method and will save the HTML from the page in a variable. The URL that is to be retrieved is defined in a textbox on the form. The textbox is named txtURL. The HTML is parsed and each hyperlink that is found in the HTML is displayed in a list box on the form. The list box is named List1.

Private Sub cmdGetLinks_Click()
    
    On Error GoTo ErrorHandler
        
    Dim strHTML
    
    List1.Clear
    List1.AddItem "Finding Links..."
    
    cmdGetLinks.Enabled = False
    
    strfTheURL = txtURL.Text
    strTemp = Inet1.OpenURL(txtURL.Text)
    
    List1.Clear
    strHTML = LCase(strTemp)
    cmdGetLinks.Enabled = True

    intLookForNextHREFHere = 1
    
    Do
        strNextHref = InStr(intLookForNextHREFHere, strHTML, "href")
        
        If strNextHref < 1 Then Exit Do
    
        intWhereIsTheEQ = InStr(strNextHref, strHTML, "=")
        
        intLookHere = intWhereIsTheEQ + 1
        
        strNextChar = Mid(strHTML, intLookHere, 1)
        
        Const DOUBLE_QUOTE = 34 ' ASCII NUMBER
        
        If strNextChar = Chr(DOUBLE_QUOTE) Then
            ' The delimiter is a double-quote, look for the next one
            intNextDelimiter = InStr(intLookHere + 1, strHTML, Chr(34))
            strTheURL = Trim(Mid(strHTML, intLookHere + 1, intNextDelimiter - intLookHere - 1))
            List1.AddItem strTheURL
        Else
            ' DOUBLE QUOTE CHARACTER NOT USED,
            ' LOOK FOR START OF URL
            Do Until Len(Trim(strNextChar)) > 0
                intLookHere = intLookHere + 1
                strNextChar = Mid(strHTML, intLookHere, 1)
            Loop
            
            ' NOW LOOK FOR END OF URL BY LOCATING THE NEXT SPACE CHARACTER
            ' OR THE NEXT HTML CLOSE TAG CHARACTER (>) WHICHEVER IS NEAREST
            If InStr(intLookHere, strHTML, " ") < InStr(intLookHere, strHTML, ">") Then
                intNextDelimiter = InStr(intLookHere, strHTML, " ")
            Else
                intNextDelimiter = InStr(intLookHere, strHTML, ">")
            End If
        
            strTheURL = Trim(Mid(strHTML, intLookHere, intNextDelimiter - intLookHere))
            List1.AddItem strTheURL
                        
        End If
        
        ' FIND NEXT URL
        intLookForNextHREFHere = intNextDelimiter
    Loop
    
    Exit Sub
    
ErrorHandler:
    Const REQUEST_TIMED_OUT = 35761
    Select Case Err.Number
        Case REQUEST_TIMED_OUT
            MsgBox Err.Number & " - " & Err.Description
        Case Else
    End Select
End Sub
DevX Pro
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date