devxlogo

Scan HTML Code for Data Extraction

Scan HTML Code for Data Extraction

Question:
Is there a way (using VB) to load a Web page (without actually showing the page through a browser or otherwise) and then extract the HTML code for use in extracting data (whether by scanning the actual HTML or saving it as a text file for later automated scanning)?

Answer:
Yes there is. Place the Microsoft Internet Control on a form, navigate to the page you want to scan, using the Navigate method. Once the page has been downloaded (beware downloading is asynchronous) you can use the WebBrowser’s Document object to gain entry into the downloaded document’s DHTML object model. The following code will return all the HTML in a document:

s=WebBrowser1.Document.All(0).OuterHTML
devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist