devxlogo

Scan HTML Code for Data Extraction

Scan HTML Code for Data Extraction

Question:
Is there a way (using VB) to load a Web page (without actually showing the page through a browser or otherwise) and then extract the HTML code for use in extracting data (whether by scanning the actual HTML or saving it as a text file for later automated scanning)?

Answer:
Yes there is. Place the Microsoft Internet Control on a form, navigate to the page you want to scan, using the Navigate method. Once the page has been downloaded (beware downloading is asynchronous) you can use the WebBrowser’s Document object to gain entry into the downloaded document’s DHTML object model. The following code will return all the HTML in a document:

s=WebBrowser1.Document.All(0).OuterHTML

devx-admin

Share the Post: