Browse DevX
Sign up for e-mail newsletters from DevX


Fix Up Your HTML with HTML Tidy and .NET : Page 3

When standards change, your development efforts must often change with them. But change doesn't always have to be painful. If you're trying to upgrade your HTML pages to the latest standards, fix unclosed tags, find and fix deprecated features, and format all your Web pages consistently, HTML Tidy is just what the doctor ordered.

Designing the NETTidy Application
The overall goal of this project was to redeploy the HTML Tidy library as a no-frills batch converter. Some subsidiary goals helped keep things simple.

The first was to choose a subset of HTML Tidy's configuration options in order to perform some small but specific task as completely as possible. For example, there are all kinds of options to replace ampersands and quotation marks, wrap particular markup sections in specific ways, or interpret specific markup tags. But I decided to stick with the tags that control the horizontal layout of the page—namely, anything related to indentation, block specification, column width, and tab size. I also threw in a couple of "smart" options simply because they're so useful—one that removes the guff from HTML documents exported from Word 2000, and another that replaces "font" and "center" tags with stylesheet directives. Fundamentally though, NETTidy remains an application for editors who want to format code blocks to specified widths with minimal fuss.

The second was to make NETTidy as fault-tolerant as possible. Case in point: I originally added a set of radio buttons that let you choose between HTML, XHTML, and XML output. I thought this was a good idea, but I quickly decided against it after accidentally selecting "XML" output and running a set of HTML files through the converter. Admittedly I ended up with perfectly good XML, so perhaps it seems a bit churlish to complain. But while Internet Explorer understood my original HTML-based site, it couldn't make head or tail of my XML-based one; worse, converting the XML back to HTML was non-trivial, and I hadn't made a backup. So to prevent scenarios like this from happening again, I removed the buttons and instead hardwired the following rules: "If a file's extension is .HTM or .HTML, convert its content to XHTML, but if it's .XML, stick with XML." Yes, this does mean you can't explicitly request an XHTML to XML transformation, but I can't actually see why you'd want one. Feel free to change the code to suit your needs if you have esoteric requirements.

There was still the issue of my mangled HTML files, though. So I added a few lines of code to backup files to the temp directory before NETTidy gets its claws into them. This way, in the worst case scenario, you can simply copy them back if you change your mind or NETTidy's results don't meet your needs.

Persisting UI Preferences
Finally, it seemed like a good idea to persist whatever options you had chosen between sessions. The obvious place to do this was in the application's .config file, and so I took a cue from the article "How to Make Your .NET Windows Forms Configuration Files Dynamic," by Russell Jones, DevX's Executive Editor. This meant side-stepping the System.Configuration namespace and accessing the file directly as XML. As a result, I wasn't actually obliged to enforce the app.config file's traditional format, but I chose to keep with it anyway, just for good measure. The application serializes preferences (combo box selections, radio button, and checkbox states) to and from the file, using an XPath query to flatten the information as follows:

   // Get the node representing the tab setting
   node = doc.DocumentElement.SelectSingleNode(
   // Serialize the txtTab text box to the 
   // config file ...
   node.Value = txtTab.Text;
   // ... or deserialize the txtTab text 
   // box from the config file
   txtTab.Text = node.Value;
Configuration properties are located within <add> nodes as follows:

      <add key="tab" value="4" />
      <add key="wrap" value="80" />
      ... etc.
This has the effect of pulling out the "value" attribute associated with the "tab" attribute within a node of type "add," wherever it appears within the document. To avoid unhelpful error messages if you fail to deploy the config file together with the application, I've also added a fallback: If there's no config file, the application simply doesn't preserve preferences.

Tricks with the TreeView Control
You'll also notice that I've subclassed TreeNode to derive a class named StateTreeNode, which exposes a single Boolean property called EverOpened. This was to make browsing for a directory of files to convert more efficient. When you start the application, the tree is initially populated through a call to System.Environment.GetLogicalDrives(). But to give graphical feedback of which drives contain subdirectories (and can therefore be "expanded" in the tree), you need to go a level deeper. The application refers to EverOpened when you expand a node in the tree, to determine whether it has previously shown you that node's list of subdirectories, or whether it needs to go off to the drive in question and physically retrieve them. Hitting "F5" resets the flag, forcing the currently selected node to recalculate its immediate subdirectory structure.

You can see this in action by using the TreeView to browse to some location on your hard drive, and then adding or removing a subdirectory. Upon hitting F5, NETTidy picks up your changes and refreshes the display, just like Windows Explorer. The TreeView indexes folders on your hard drive on demand, rather than having to build a complete view in advance, which improves the application's responsiveness and reduces its startup time. It also means nodes within the tree retain their individual expanded or collapsed state until you explicitly refresh them, reducing flicker and making browsing easier. I personally work with TreeView controls a lot, and have found this subclassing technique invaluable, so feel free to reuse it in your own projects.

Thanks for your registration, follow us on our social networks to keep up-to-date