dcsimg
LinkedIn
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Fast Text Processing in SQL Server : Page 3

Processing text or long strings usually reduces SQL to a prosaic procedural language. Learn a few techniques for facilitating speedy text processing in SQL.


advertisement

SQL Server 2005 Recursive Techniques (cont'd)

Now, go back to the Listing 1. Replace lines 4 and 6 of the code with the following lines:

4 (SELECT 1 as startPos, CHARINDEX (' A',@str, 1) as  spacePos  
. . . . .

6  SELECT spacePos + 1, CHARINDEX (' A',@str, spacePos + 1) 

Running this updated script, Listing 1 will produce the following result:


startPos    spacePos    singleWord
1           14          A lonely moon
15          30          about the sky, 
31          62          A gentle flower in the breeze, 
63          160         A giant cloud of smoky stars,. . .sweet voice of melody
161         175         And moist warm
176         189         air, which we
190         0           all breathe  Tremendous power. . . everyone believe...

At first glance, the result doesn't make any sense; you just got all the phrases that start with a specific letter (A). But as you will see later, this option can be very useful when you want to format the text. Thus, when you split and load text into a table, you get easy access to each separate word, and that gives you a lot of flexibility and processing power.

But what if you need even more granularity? What if you need to work with each individual letter, instead of the whole word? Well, you can do that using the following script:


Listing 2. How to Convert by Letters and Load Text into the Table


SET QUOTED_IDENTIFIER OFF GO DECLARE @str varchar(2000); SELECT @str = "A lonely moon about the sky, A gentle flower in the breeze, A giant cloud of smoky stars, Some sticky honey made by bees. The soft sweet voice of melody And moist warm air, which we all breathe Tremendous power of our minds Which can make everyone believe..."; 1 SELECT @str = REPLACE(@str, CHAR(10), ' '); 2 SELECT @str = REPLACE(@str, CHAR(13), ''); 3 WITH ProcessText AS 4 (SELECT 1 as startPos, SUBSTRING (@str,1,1) as letter 5 UNION ALL 6 SELECT startPos + 1, SUBSTRING (@str,startPos+1, 1) 7 FROM ProcessText WHERE startPos <= len(@str)) 8 SELECT * INTO #letters 9 FROM ProcessText 10 OPTION(MAXRECURSION 0); SELECT * FROM #letters; Result: startPos letter ----------- ------ 1 A 2 3 l 4 o 5 n 6 e 7 l 8 y 9 10 m 11 o 12 o 13 n . . . . . . . 249 250 b 251 e 252 l 253 i 254 e 255 v 256 e 257 . 258 . 259 . 260

The solution in Listing 2 is based on the same idea as the solution in Listing 1. The query recursively applies the SUBSTRING() function to the original string, incrementing the starting position by one on each recursion's step, and stops processing when the starting position reaches the end of the string (text).

The solution from Listing 2 gives you direct access to each letter in the text, which gives you additional flexibility and processing power. Now, you can easily determine the number of letters in the text:


SELECT COUNT(*) AS NumLetters
   FROM #letters
   WHERE letter <> '';

Result:

NumLetters
-----------
219

You can determine how often each distinct letter has been used:


SELECT COUNT(*) AS NumLetters,letter  
   FROM #letters
   WHERE letter <> ''
   GROUP BY letter;

Result:

NumLetters  letter
----------- ------
4           ,
4           .

14          a
6           b
6           c
6           d
32          e
. . . . . . .

13          t
4           u
3           v
7           w
8           y
1           z

If you change Listing 2 slightly by replacing lines 4 and 6 with the following lines:


4 (SELECT 1 as startPos, SUBSTRING (@str,1,2) as letter  
  . . . . . . . . . . . .

6 SELECT startPos + 2, SUBSTRING (@str,startPos+2, 2)

You will get the consequent pairs of letters in the result:


startPos    letter
----------- ------
1           A 
3           lo
5           ne
7           ly
9            m
11          oo
13          n 
15           a
17          bo
. . . . . . . .

If you replace lines 4 and 6 with the following lines:


4 (SELECT 1 as startPos, SUBSTRING (@str,1,2) as letter  
  . . . . . . . . . . . .

6 SELECT startPos + 2, SUBSTRING (@str,startPos+2, 2)

You will get the consequent pairs of letters with overlapping:


startPos    letter
----------- ------
1           A 
2           lo
3           on
4           ne
5           el
6           ly
7           y 
. . . . . . .

The preceding two cases allow text splitting by two letters (it actually can be any number of letters) and potentially are very useful for text formatting or finding duplicates or extra spaces in the text (more on that later).



Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date