Question:
I would like to have a page that allows a user to conduct a search. I would like the action of the search to be an ASP page that in turn queries yahoo.com, altavista.digital.com, and excite.com, and returns the results back to a string variable in my ASP page.
I would then like to parse through this string, pulling out the relevant links from each search engine and create a new HTML stream with the links from all of the search engines, indicating the original source after the link.
What is the best way to capture content from another site for parsing and reformatting?
Thanks!
Answer:
The Microsoft Internet Transfer control makes it a snap to capture HTML from any Web site. The control has a number of properties and methods that are useful, but one of the most important is the OpenURL method. Using a single call to this method, you can retrieve HTML into a variable. You can then parse this variable in any way that you like.
The following excerpt is from a Visual Basic application that uses an Internet Transfer Control named Inet1. This procedure will call the OpenURL method and will save the HTML from the page in a variable. The URL that is to be retrieved is defined in a textbox on the form. The textbox is named txtURL. The HTML is parsed and each hyperlink that is found in the HTML is displayed in a list box on the form. The list box is named List1.
Private Sub cmdGetLinks_Click() On Error GoTo ErrorHandler Dim strHTML List1.Clear List1.AddItem "Finding Links..." cmdGetLinks.Enabled = False strfTheURL = txtURL.Text strTemp = Inet1.OpenURL(txtURL.Text) List1.Clear strHTML = LCase(strTemp) cmdGetLinks.Enabled = True intLookForNextHREFHere = 1 Do strNextHref = InStr(intLookForNextHREFHere, strHTML, "href") If strNextHref < 1 Then Exit Do intWhereIsTheEQ = InStr(strNextHref, strHTML, "=") intLookHere = intWhereIsTheEQ + 1 strNextChar = Mid(strHTML, intLookHere, 1) Const DOUBLE_QUOTE = 34 ' ASCII NUMBER If strNextChar = Chr(DOUBLE_QUOTE) Then ' The delimiter is a double-quote, look for the next one intNextDelimiter = InStr(intLookHere + 1, strHTML, Chr(34)) strTheURL = Trim(Mid(strHTML, intLookHere + 1, intNextDelimiter - intLookHere - 1)) List1.AddItem strTheURL Else ' DOUBLE QUOTE CHARACTER NOT USED, ' LOOK FOR START OF URL Do Until Len(Trim(strNextChar)) > 0 intLookHere = intLookHere + 1 strNextChar = Mid(strHTML, intLookHere, 1) Loop ' NOW LOOK FOR END OF URL BY LOCATING THE NEXT SPACE CHARACTER ' OR THE NEXT HTML CLOSE TAG CHARACTER (>) WHICHEVER IS NEAREST If InStr(intLookHere, strHTML, " ") < InStr(intLookHere, strHTML, ">") Then intNextDelimiter = InStr(intLookHere, strHTML, " ") Else intNextDelimiter = InStr(intLookHere, strHTML, ">") End If strTheURL = Trim(Mid(strHTML, intLookHere, intNextDelimiter - intLookHere)) List1.AddItem strTheURL End If ' FIND NEXT URL intLookForNextHREFHere = intNextDelimiter Loop Exit Sub ErrorHandler: Const REQUEST_TIMED_OUT = 35761 Select Case Err.Number Case REQUEST_TIMED_OUT MsgBox Err.Number & " - " & Err.Description Case Else End SelectEnd Sub