devxlogo

Capturing Search Engine Results

Capturing Search Engine Results

Question:
I would like to have a page that allows a user to conduct a search. I would like the action of the search to be an ASP page that in turn queries yahoo.com, altavista.digital.com, and excite.com, and returns the results back to a string variable in my ASP page.

I would then like to parse through this string, pulling out the relevant links from each search engine and create a new HTML stream with the links from all of the search engines, indicating the original source after the link.

What is the best way to capture content from another site for parsing and reformatting?

Thanks!

Answer:
The Microsoft Internet Transfer control makes it a snap to capture HTML from any Web site. The control has a number of properties and methods that are useful, but one of the most important is the OpenURL method. Using a single call to this method, you can retrieve HTML into a variable. You can then parse this variable in any way that you like.

The following excerpt is from a Visual Basic application that uses an Internet Transfer Control named Inet1. This procedure will call the OpenURL method and will save the HTML from the page in a variable. The URL that is to be retrieved is defined in a textbox on the form. The textbox is named txtURL. The HTML is parsed and each hyperlink that is found in the HTML is displayed in a list box on the form. The list box is named List1.

Private Sub cmdGetLinks_Click()        On Error GoTo ErrorHandler            Dim strHTML        List1.Clear    List1.AddItem "Finding Links..."        cmdGetLinks.Enabled = False        strfTheURL = txtURL.Text    strTemp = Inet1.OpenURL(txtURL.Text)        List1.Clear    strHTML = LCase(strTemp)    cmdGetLinks.Enabled = True    intLookForNextHREFHere = 1        Do        strNextHref = InStr(intLookForNextHREFHere, strHTML, "href")                If strNextHref < 1 Then Exit Do            intWhereIsTheEQ = InStr(strNextHref, strHTML, "=")                intLookHere = intWhereIsTheEQ + 1                strNextChar = Mid(strHTML, intLookHere, 1)                Const DOUBLE_QUOTE = 34 ' ASCII NUMBER                If strNextChar = Chr(DOUBLE_QUOTE) Then            ' The delimiter is a double-quote, look for the next one            intNextDelimiter = InStr(intLookHere + 1, strHTML, Chr(34))            strTheURL = Trim(Mid(strHTML, intLookHere + 1, intNextDelimiter - intLookHere - 1))            List1.AddItem strTheURL        Else            ' DOUBLE QUOTE CHARACTER NOT USED,            ' LOOK FOR START OF URL            Do Until Len(Trim(strNextChar)) > 0                intLookHere = intLookHere + 1                strNextChar = Mid(strHTML, intLookHere, 1)            Loop                        ' NOW LOOK FOR END OF URL BY LOCATING THE NEXT SPACE CHARACTER            ' OR THE NEXT HTML CLOSE TAG CHARACTER (>) WHICHEVER IS NEAREST            If InStr(intLookHere, strHTML, " ") < InStr(intLookHere, strHTML, ">") Then                intNextDelimiter = InStr(intLookHere, strHTML, " ")            Else                intNextDelimiter = InStr(intLookHere, strHTML, ">")            End If                    strTheURL = Trim(Mid(strHTML, intLookHere, intNextDelimiter - intLookHere))            List1.AddItem strTheURL                                End If                ' FIND NEXT URL        intLookForNextHREFHere = intNextDelimiter    Loop        Exit Sub    ErrorHandler:    Const REQUEST_TIMED_OUT = 35761    Select Case Err.Number        Case REQUEST_TIMED_OUT            MsgBox Err.Number & " - " & Err.Description        Case Else    End SelectEnd Sub
devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist