Improving Performance While Adding Full-Text Search
I've come full circle to the initial discussion of trade-offs between normal form, query scenarios, and performance. Neither of the two extremes (Fully normalize your point data and store nothing but BLOBs) are likely to be correct. Instead, the right approach varies from application to application.
For the final example, take one piece of data derivable from Ink (the recognized text) and associate it with an Ink store. Full-text search requires a considerable departure from the SQL standard, so this example is highly specific to SQL Server 2005.
First, add an ID field of SQL type
uniqueidentifier to the
Ink table and set it to be the primary key. Create another table,
RecoResults, which also has a
uniqueidentifierId, a foreign key to Ink table (
InkId) an
ntext field called
Alternate, and another called
RecoConfidence. (See
Figure 2.)
 | |
| Figure 2. Storing Ink: The figure shows the process of storing Ink-derived data for full-text search of recognized handwriting. |
After creating the
RecoResults table, open SQL Server Management Studio, right-click on the table, and choose "Define Full-Text Index
" to create a full-text index for the
Alternate column via a simple wizard.
Back in Visual Studio in the Example1 event-handler for
Save, modify the SqlCommand so that it includes a new GUID to ID the Ink. After the Ink is inserted, add a call to a new method:
AddRecognizedText(conn,
(Guid)cmd.Parameters["@Id"].Value,
io.Ink.Strokes);
To call
AddRecognizedText(), you need an open SqlConnection, the GUID of the row in which you've stored the Ink's data, and the Strokes collection that represents the pen-down to pen-up strokes in the Ink. In the Tablet PC SDK, recognition is done by a RecognitionContext object. The first block of code in
Listing 3 is the boilerplate code to get a default context, here called
ctxt.
You could recognize the Strokes as a single whole, but in many applications, an Ink object contains a mix of text and drawing areas.
|
|
You could recognize the Strokes as a single whole, but in many applications an Ink object contains a mix of text and drawing areas. (For instance, in an image annotation application, you might draw dent #1 and draw an arrow to the relevant portion of the photo.) Divvying up a Strokes collection into text and non-text is done by the Divider class.
After instantiating the divider object and calling
Divide() on it, you end up with a DivisionResult object. Typically, this holds some mix of
Drawing,
Paragraph,
Line, and
Segment results, which you can iterate over by calling
ResultsByType() and passing the appropriate type.
Keeping in mind that you've only requested
Paragraphs, iterate each DivisionResult and use the
ctxt to perform recognition on the
Strokes collection of the DivisionResult. If the
RecognitionStatus is
NoError, you know that you have some recognized text. The handwriting recognition typically returns more than one RecognitionAlternate, each of which has a
RecognitionConfidence value of
Strong,
Intermediate, or
Poor. The recognized text is available from the
ToString() method of the RecognitionAlternate. In
Listing 3, the innermost loop inserts each possible Recognition object, along with its Confidence value, into the
RecoResults table of the database. To query this data, use the SQL Server-specific
Contains() function. Here's an example.
SELECT Id, Ink, Length FROM Ink, RecoResults WHERE
RecoResults.InkId = Ink.ID AND
Contains(RecoResults.Alternate, 'foo') AND
RecoResults.Confidence = 'Strong'
An interesting thing about this query is that you cannot use the
DISTINCT qualifier, as the presence of an image column disqualifies that argument. So you can either accept duplicates in the return (but remember that a duplicate here means duplicating the
byte[] blob) or use a two-stage query in a stored procedure (which is certainly what I'd choose in a production application), selecting the
DISTINCT InkIds from the
RecoResults table and then using those to query the
Ink table.
Fill Up With Ink
Although it is not possible to add Ink directly into a database, the techniques shown in this article show how easy it is to convert Ink to and from a BLOB and work on that basis. Databinding is also easy, although as discussed in Example 2, a client-side DataSet containing the entire database's worth of Ink data is unlikely to be a good idea. Instead, look at your usage and query scenarios, and directly store the Ink-derived metadata in appropriate tables (or, if you prefer, XML columns) as you did in Example3. At that point, recovering the desired Ink is simply a matter of working back to the ink data via standard database joins. It couldn't be easier!