Having built a search tree, the next task is to search through the document base for candidates that satisfy its constraints. The interface class IDocument provides an abstract description of a searchable object. Classes that implement IDocument must expose a Name
property for identification purposes, and must be able to search their content for a specified token. Accordingly, the IDocument interface consists of the following pair of methods:
bool Find(string str)
IDocument introduces an additional level of abstraction, for several reasons:
- It allows the search engine to remain loosely coupled with the search content, meaning that content does not need to be text-based; it simply needs to be searchable.
- It facilitates simultaneous searching of disparate mediafor example, Word documents, RSS news feeds, SOAP packets or Windows help files.
For demonstration purposes, I've provided only two concrete implementations of IDocument. These are the lightweight FileDocument and XMLDocument classes, both of which lazily load their respective documents on first access, and simply call System.String's IndexOf
method to search for string-based tokens on request.
Determining whether a document matches the search criteria means walking the search tree, looking for an unbroken path of descent from root to leaf. For pattern matching, you're interested only in nodes that contain string token data, so the search code silently skips over tree nodes that represent opening and closing parentheses, and instead moves straight on to their children.
Putting it all together
The sample source code is relatively self-explanatory. Figure 4 shows the final class hierarchy, although the demo class TestSearch (within the same namespace) is certainly the easiest way to get a feel for how the various pieces fit together.
|Figure 4. Final Class Hierarchy: The figure shows the final class hierarchy for the sample search project.|
TestSearch is a vanilla console application; it indexes any XML files present in the project's source directory (I've supplied a few test files to get you started), and lets you enter arbitrarily complex Boolean search strings at the command line along the lines of:
hobbits & (dwarves | (wizards & !elves))
If you break the search engine's syntax rules, it tells you so. If you adhere to them, it tells you how many documents match your search. Here's the basic process to search a set of documents:
string search_query = ....;
QueryBuilder builder = new
// Query is valid, so build a tree
QueryTree tree = builder.BuildTree();
// Build list of resources to index
IDocument docs = ...
// Retrieve matching documents
IDocument matches = tree.GetMatches(docs);
// Do things with document matches
I hope you'll find it easy to incorporate this search capability into your own projectsBoolean searching is something I find hard to live without.