Browse DevX
Sign up for e-mail newsletters from DevX


Building Wiki Web Sites with ASP.NET and SQL Server : Page 4

You can easily build Wiki Web sites with ASP.NET and SQL Server and provide your teams with one of the most powerful ways of collaborating on the Web.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

The DotWiki Parser
One of the most important classes of the DotWiki, the Wiki class, basically parses text coming from the database and returns a "browser friendly" version of the text passed to it. For example, a user might enter the following text in a Web page:

The song HelloDolly was written by LouisArmstrong. Louis was a great jazz musician.

The Wiki class analyzes this text and replaces all CamelCase words with hyperlinks. Since HTML does not honor carriage return or line feed characters, the Wiki class will add HTML
tags wherever the user entered a carriage return to ensure the browser displays the text in the corresponding lines. Here you can see what the Wiki will return for the text mentioned above. (The text added by the parser is bold.)

The song HelloDolly was written by LouisArmstrong.
Louis was a great jazz musician.

The Parser replaces each CamelCase word with a hyperlink that loads the Default page back again. Notice that each CamelCase points to a different topic in the database.

Although most programming languages provide functions to find exact matches of a string inside another string, each language provides its own way of doing this. (For example, VB uses the InStr function and C/C++ uses the strstr function.) On the other hand, regular expressions have basically standardized the task of finding patterns inside strings. (See the RegEx article sidebar.)

Regular expressions are a mathematical notation for describing patterns and they are widely used by developers to find patters in strings. .NET provides a Regular Expression class in the System.Text.RegularExpressions namespace.

Words in CamelCase notation follow a pattern in which the first character is an uppercase character, followed by a few lowercase characters, followed by another uppercase character, followed by more lowercase characters. For example, LouisArmstrong follows this pattern.

The DotWiki project described in this article would consider the words CoDeMagazine and The20thCentury to be CamelCase words.

The DotWiki uses the regular expression below to detect words in CamelCase notation:


Although regular expressions intimidate many developers at first (the expression described above looks anything but regular), you'll find the syntax relatively easy to understand once you know the basics. Table 1 describes what each of the components of the aforementioned regular expression mean.

Table 1: Analysis of the regular expression used by the DotWiki to detect words in CamelCase notation.




We are looking for an uppercase character (A-Z)


That can be followed by zero or more (*) word characters (\w)


That must be followed by at least one lowercase character (a-z)


That can be followed by zero or more (*) word characters (\w)


That must be followed by one uppercase character


That can be followed by zero or more (*) word characters (\w)


That must end in a word boundary (\b)

This snippet demonstrates how you can use this regular expression in .NET to search for CamelCase words and replace them with hyperlinks:

CamelCaseRegEx = "[AZ]\w*[az]\w*[AZ]\w*(?=\b)" ParsedText = Regex.Replace(TextToParse, _ CamelCaseRegEx, _ AddressOf EvaluateCamelCaseWord)

The RegEx.Replace call will scan through TextToParse. When it finds text that matches the CamelCaseRegEx , it will call the EvaluateCamelCaseWord method to allow you to perform whatever manipulation you want to do with the text that matches the regular expression. The RegEx class will automatically call EvaluateCamelCaseWord each time it finds a match in the text.

The following code snippet shows EvaluateCamelCaseWord:

Public Shared Function EvaluateCamelCaseWord( _ ByVal m As Match) As String Return "<a href=default.aspx?topic=" + _ m.Value + ">" + m.Value + "</a>" End Function

The DotWiki also uses Regular Expressions to look for e-mail addresses in the text and replace them with hyperlink e-mail addresses so that users can click the hyperlink and have the user's browser launch their e-mail client. The following code snippet demonstrates how the DotWiki looks for e-mail addresses using a regular expression.

EmailRegEx = "\w+[@][^\s]+" ParsedText = Regex.Replace( TextToParse, _ EmailRegEx, AddressOf EvaluateEmailAddress)

The next code snippet shows the code that replaces each match of an e-mail address with a hyperlinked version of it. Notice that this snippet also HTML encodes the e-mail address so that spammers cannot easily lift them from the DotWiki Web pages.

Public Shared Function EvaluateEmailAddress( _ ByVal m As Match) As String Dim EmailAddress As String = m.Value.ToLower() Dim MailTo As String = "mailto:" + EmailAddress Return "<a href=" + Chr(34) + _ HtmlEncoded(MailTo) + Chr(34) + ">" + _ HtmlEncoded(EmailAddress) + "</a>" End Function

Inside the Wiki class you will see more uses of Regular Expressions to search and replace HTTP references, carriage returns, and other HTML tags that are interesting for our purposes.

Although I could have built my own parser to manually scan through the text and replace CamelCase words, regular expressions allow me to do it in one line of code. A simple call to RegEx.Replace with the right regular expression does the job. Furthermore, I can apply the same principle to search and replace e-mail addresses and other patterns that interest me by following the same technique. Listing 2 shows the complete code for the WikiText method that performs the parsing on the topic information.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date