Improving Performance While Adding Full-Text Search
I've come full circle to the initial discussion of trade-offs between normal form, query scenarios, and performance. Neither of the two extremes (Fully normalize your point data and store nothing but BLOBs) are likely to be correct. Instead, the right approach varies from application to application.
For the final example, take one piece of data derivable from Ink (the recognized text) and associate it with an Ink store. Full-text search requires a considerable departure from the SQL standard, so this example is highly specific to SQL Server 2005.
First, add an ID field of SQL type uniqueidentifier
to the Ink
table and set it to be the primary key. Create another table, RecoResults
, which also has a uniqueidentifierId
, a foreign key to Ink table (InkId
) an ntext
field called Alternate
, and another called RecoConfidence
. (See Figure 2
|Figure 2. Storing Ink: The figure shows the process of storing Ink-derived data for full-text search of recognized handwriting.|
After creating the RecoResults
table, open SQL Server Management Studio, right-click on the table, and choose "Define Full-Text Index
" to create a full-text index for the Alternate
column via a simple wizard.
Back in Visual Studio in the Example1 event-handler for Save
, modify the SqlCommand so that it includes a new GUID to ID the Ink. After the Ink is inserted, add a call to a new method:
To call AddRecognizedText()
, you need an open SqlConnection, the GUID of the row in which you've stored the Ink's data, and the Strokes collection that represents the pen-down to pen-up strokes in the Ink. In the Tablet PC SDK, recognition is done by a RecognitionContext object. The first block of code in Listing 3
is the boilerplate code to get a default context, here called ctxt
|You could recognize the Strokes as a single whole, but in many applications, an Ink object contains a mix of text and drawing areas.|
You could recognize the Strokes as a single whole, but in many applications an Ink object contains a mix of text and drawing areas. (For instance, in an image annotation application, you might draw dent #1 and draw an arrow to the relevant portion of the photo.) Divvying up a Strokes collection into text and non-text is done by the Divider class.
After instantiating the divider object and calling Divide()
on it, you end up with a DivisionResult object. Typically, this holds some mix of Drawing
, and Segment
results, which you can iterate over by calling ResultsByType()
and passing the appropriate type.
Keeping in mind that you've only requested Paragraphs
, iterate each DivisionResult and use the ctxt
to perform recognition on the Strokes
collection of the DivisionResult. If the RecognitionStatus
, you know that you have some recognized text. The handwriting recognition typically returns more than one RecognitionAlternate, each of which has a RecognitionConfidence
value of Strong
, or Poor
. The recognized text is available from the ToString()
method of the RecognitionAlternate. In Listing 3
, the innermost loop inserts each possible Recognition object, along with its Confidence value, into the RecoResults
table of the database. To query this data, use the SQL Server-specific Contains()
function. Here's an example.
SELECT Id, Ink, Length FROM Ink, RecoResults WHERE
RecoResults.InkId = Ink.ID AND
Contains(RecoResults.Alternate, 'foo') AND
RecoResults.Confidence = 'Strong'
An interesting thing about this query is that you cannot use the DISTINCT
qualifier, as the presence of an image column disqualifies that argument. So you can either accept duplicates in the return (but remember that a duplicate here means duplicating the byte
blob) or use a two-stage query in a stored procedure (which is certainly what I'd choose in a production application), selecting the DISTINCT InkIds
from the RecoResults
table and then using those to query the Ink
Fill Up With Ink
Although it is not possible to add Ink directly into a database, the techniques shown in this article show how easy it is to convert Ink to and from a BLOB and work on that basis. Databinding is also easy, although as discussed in Example 2, a client-side DataSet containing the entire database's worth of Ink data is unlikely to be a good idea. Instead, look at your usage and query scenarios, and directly store the Ink-derived metadata in appropriate tables (or, if you prefer, XML columns) as you did in Example3. At that point, recovering the desired Ink is simply a matter of working back to the ink data via standard database joins. It couldn't be easier!