Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Fast Text Processing in SQL Server : Page 3

Processing text or long strings usually reduces SQL to a prosaic procedural language. Learn a few techniques for facilitating speedy text processing in SQL.


advertisement

SQL Server 2005 Recursive Techniques (cont'd)

Now, go back to the Listing 1. Replace lines 4 and 6 of the code with the following lines:

4 (SELECT 1 as startPos, CHARINDEX (' A',@str, 1) as spacePos . . . . . 6 SELECT spacePos + 1, CHARINDEX (' A',@str, spacePos + 1)

Running this updated script, Listing 1 will produce the following result:



startPos spacePos singleWord 1 14 A lonely moon 15 30 about the sky, 31 62 A gentle flower in the breeze, 63 160 A giant cloud of smoky stars,. . .sweet voice of melody 161 175 And moist warm 176 189 air, which we 190 0 all breathe Tremendous power. . . everyone believe...

At first glance, the result doesn't make any sense; you just got all the phrases that start with a specific letter (A). But as you will see later, this option can be very useful when you want to format the text. Thus, when you split and load text into a table, you get easy access to each separate word, and that gives you a lot of flexibility and processing power.

But what if you need even more granularity? What if you need to work with each individual letter, instead of the whole word? Well, you can do that using the following script:

Listing 2. How to Convert by Letters and Load Text into the Table

SET QUOTED_IDENTIFIER OFF GO DECLARE @str varchar(2000); SELECT @str = "A lonely moon about the sky, A gentle flower in the breeze, A giant cloud of smoky stars, Some sticky honey made by bees. The soft sweet voice of melody And moist warm air, which we all breathe Tremendous power of our minds Which can make everyone believe..."; 1 SELECT @str = REPLACE(@str, CHAR(10), ' '); 2 SELECT @str = REPLACE(@str, CHAR(13), ''); 3 WITH ProcessText AS 4 (SELECT 1 as startPos, SUBSTRING (@str,1,1) as letter 5 UNION ALL 6 SELECT startPos + 1, SUBSTRING (@str,startPos+1, 1) 7 FROM ProcessText WHERE startPos <= len(@str)) 8 SELECT * INTO #letters 9 FROM ProcessText 10 OPTION(MAXRECURSION 0); SELECT * FROM #letters; Result: startPos letter ----------- ------ 1 A 2 3 l 4 o 5 n 6 e 7 l 8 y 9 10 m 11 o 12 o 13 n . . . . . . . 249 250 b 251 e 252 l 253 i 254 e 255 v 256 e 257 . 258 . 259 . 260

The solution in Listing 2 is based on the same idea as the solution in Listing 1. The query recursively applies the SUBSTRING() function to the original string, incrementing the starting position by one on each recursion's step, and stops processing when the starting position reaches the end of the string (text).

The solution from Listing 2 gives you direct access to each letter in the text, which gives you additional flexibility and processing power. Now, you can easily determine the number of letters in the text:

SELECT COUNT(*) AS NumLetters FROM #letters WHERE letter <> ''; Result: NumLetters ----------- 219

You can determine how often each distinct letter has been used:

SELECT COUNT(*) AS NumLetters,letter FROM #letters WHERE letter <> '' GROUP BY letter; Result: NumLetters letter ----------- ------ 4 , 4 . 14 a 6 b 6 c 6 d 32 e . . . . . . . 13 t 4 u 3 v 7 w 8 y 1 z

If you change Listing 2 slightly by replacing lines 4 and 6 with the following lines:

4 (SELECT 1 as startPos, SUBSTRING (@str,1,2) as letter . . . . . . . . . . . . 6 SELECT startPos + 2, SUBSTRING (@str,startPos+2, 2)

You will get the consequent pairs of letters in the result:

startPos letter ----------- ------ 1 A 3 lo 5 ne 7 ly 9 m 11 oo 13 n 15 a 17 bo . . . . . . . .

If you replace lines 4 and 6 with the following lines:

4 (SELECT 1 as startPos, SUBSTRING (@str,1,2) as letter . . . . . . . . . . . . 6 SELECT startPos + 2, SUBSTRING (@str,startPos+2, 2)

You will get the consequent pairs of letters with overlapping:

startPos letter ----------- ------ 1 A 2 lo 3 on 4 ne 5 el 6 ly 7 y . . . . . . .

The preceding two cases allow text splitting by two letters (it actually can be any number of letters) and potentially are very useful for text formatting or finding duplicates or extra spaces in the text (more on that later).



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap