Browse DevX
Sign up for e-mail newsletters from DevX


Add Boolean Searches to Your .NET Applications : Page 3

Writing your own search engine isn't as difficult as you might think. If you customarily wrestle with Explorer's seemingly unconfigurable "Search for Files and Folders" or squint helplessly for light at the end of the dark tunnel that is Visual Studio.NET's integrated help system, then you'll probably understand where the impetus for this article originated.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Indexing Documents
Having built a search tree, the next task is to search through the document base for candidates that satisfy its constraints. The interface class IDocument provides an abstract description of a searchable object. Classes that implement IDocument must expose a Name property for identification purposes, and must be able to search their content for a specified token. Accordingly, the IDocument interface consists of the following pair of methods:

string Name() bool Find(string str)

IDocument introduces an additional level of abstraction, for several reasons:

  • It allows the search engine to remain loosely coupled with the search content, meaning that content does not need to be text-based; it simply needs to be searchable.
  • It facilitates simultaneous searching of disparate media—for example, Word documents, RSS news feeds, SOAP packets or Windows help files.
For demonstration purposes, I've provided only two concrete implementations of IDocument. These are the lightweight FileDocument and XMLDocument classes, both of which lazily load their respective documents on first access, and simply call System.String's IndexOf method to search for string-based tokens on request.

Determining whether a document matches the search criteria means walking the search tree, looking for an unbroken path of descent from root to leaf. For pattern matching, you're interested only in nodes that contain string token data, so the search code silently skips over tree nodes that represent opening and closing parentheses, and instead moves straight on to their children.

Putting it all together
The sample source code is relatively self-explanatory. Figure 4 shows the final class hierarchy, although the demo class TestSearch (within the same namespace) is certainly the easiest way to get a feel for how the various pieces fit together.
Figure 4. Final Class Hierarchy: The figure shows the final class hierarchy for the sample search project.
TestSearch is a vanilla console application; it indexes any XML files present in the project's source directory (I've supplied a few test files to get you started), and lets you enter arbitrarily complex Boolean search strings at the command line along the lines of:

hobbits & (dwarves | (wizards & !elves))

If you break the search engine's syntax rules, it tells you so. If you adhere to them, it tells you how many documents match your search. Here's the basic process to search a set of documents:

string search_query = ....; QueryBuilder builder = new QueryBuilder(search_query); if (builder.Validate()) { // Query is valid, so build a tree QueryTree tree = builder.BuildTree(); // Build list of resources to index IDocument[] docs = ... // Retrieve matching documents IDocument[] matches = tree.GetMatches(docs); // Do things with document matches }

I hope you'll find it easy to incorporate this search capability into your own projects—Boolean searching is something I find hard to live without.

Alex Hildyard  is a freelance software consultant and writer, specializing in Web technology. He can be contacted .
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date