Browse DevX
Sign up for e-mail newsletters from DevX


Fix Up Your HTML with HTML Tidy and .NET : Page 4

When standards change, your development efforts must often change with them. But change doesn't always have to be painful. If you're trying to upgrade your HTML pages to the latest standards, fix unclosed tags, find and fix deprecated features, and format all your Web pages consistently, HTML Tidy is just what the doctor ordered.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

A Word of Caution
HTML Tidy is not a panacea for solving all your markup problems, and you should be prepared for the fact it may change working HTML into reformed HTML or XHTML that no longer "works." This is usually because the "working HTML" in question does not in fact comply with its doctype (explicit or implied), but your particular browser produces what appears to be "correct" behavior anyway. For example, I've been guilty of nesting TABLE tags within SPAN tags. According to the HTML 4.0 Transitional doctype I've been using, that isn't permissible, but in Internet Explorer 6 I end up with the effect I want all the same. However, if I were to update my doctype to XHTML 1.0, my tables would no longer position "correctly." While you can generally rely on HTML Tidy to alert you to potential problems like this, its resolutions may not always make immediate sense if you don't appreciate the logic behind its decisions. In this case, it took the following source:

<span id="span_1"> <table> <tr><td>Test</td></tr> </table> <span id="span_2"> </span> </span>

Then HTML Tidy rendered the output as follows:

<span id="span_1"> </span> <table> <tr> <td>Test</td> </tr> </table> <span id="span_1"> <span id="span_2"> </span> </span>

Duplicating the <span> tag might look like an error, but in fact the only sensible way to fix the illegal nesting is to close the <span> before the table, and then reopen it again afterwards. HTML Tidy's diagnostics, sent to NETTidy's output panel, explain what it's done:

TidyWarning: (6, 1): missing </span> before <table> TidyWarning: (10, 4): inserting implicit <span>

Despite such minor problems, in the final assessment, HTML Tidy is a powerful API for parsing, altering, and formatting HTML, and it continues to be developed and refined. As you've seen, it's easy to incorporate it into your .NET projects—and it's worth downloading and using for its diagnostics alone. NETTidy leverages only a little of its power; there's a lot still left there under the hood, so I encourage you to use it as a springboard for further development in your own projects.

Alex Hildyard  is a freelance software consultant and writer, specializing in Web technology. He can be contacted .
Thanks for your registration, follow us on our social networks to keep up-to-date