SQL Server 2000 and SQL Server 2005 Techniques
The recursive queries using CTE are available only in SS2005, so to produce the same results in SS2000 as the techniques in the previous section did, you need to apply dynamic SQL. The following script (see
Listing 3) demonstrates that technique:
Listing 3. How to Convert by Words and Load Text into the Table, Using Dynamic SQL
SET QUOTED_IDENTIFIER OFF
GO
SET NOCOUNT ON
DECLARE @str varchar(8000)
SELECT @str = "James Joyce (1882-1941) was an Irish writer and poet, and is
widely considered one of the most significant writers of the 20th century."
SELECT @str = REPLACE(@str, CHAR(13),'')
SELECT @str = REPLACE(@str, CHAR(10), '')
IF EXISTS(SELECT name FROM sysobjects WHERE name = 'dynWords' AND type = 'U')
DROP TABLE dynWords
CREATE TABLE dynWords(
wordID int identity(1,1) not null,
word varchar(30) not null)
SELECT @Str = 'INSERT INTO dynWords(word) SELECT A="' +
REPLACE(@str, ' ', '"UNION ALL SELECT"') + '"'
EXECUTE(@str);
SELECT * FROM dynWords;
Result:
wordID word
------ -------------
1 James
2 Joyce
3 (1882-1941)
4 was
. . . . . . . . . . . .
20 of
21 the
22 20th
23 century.
The logic in this example is very simple. You need to add the header "INSERT INTO #dynWords(word) SELECT A=" to the original text and then replace the spaces between the words with the phrase "UNION ALL SELECT". You will get the following string:
INSERT INTO #dynWords(word) SELECT A="James"UNION ALL SELECT"Joyce"… UNION ALL SELECT"century."
When you execute this string dynamically, the result will be loaded into the #dynWords table.
Now, let's try to determine:
- The number of words in the text
- The length of each word
- The position of each empty space between the words
Listing 4 is the solution to accomplish this.
Listing 4. Processing Converted Text, Using SQL (Part 1)
-- 1. The number of words in the text
SELECT COUNT(*) AS 'Total words in the text:'
FROM dynWords
WHERE word NOT IN ('', ' ', CHAR(13), CHAR(10));
Result:
Total words in the text:
------------------------
23
-- 2. The length of each word
SELECT *, LEN(word) wordLength FROM dynWords;
Result:
wordID word wordLength
----------- ------------------------------ -----------
1 James 5
2 Joyce 4
3 (1882-1941) 11
. . . . . . . . . . . . . . . . . . . . . . . . . . .
21 the 3
22 20th 4
23 century. 8
-- 3. The numeric position of each empty space between the words
PRINT 'The numeric position of each empty space between the words:'
SELECT tbl2.*, tbl1.c1 as spacePos
FROM (SELECT c1 = SUM(LEN(t1.word)+1), c2 = t2.wordID
FROM dynWords AS t1 INNER JOIN dynWords AS t2
ON t1.wordID<=t2.wordID
GROUP BY t2.wordID) tbl1
INNER JOIN (SELECT * FROM dynWords) tbl2
ON tbl1.c2 = tbl2.wordID
Result:
The numeric position of each empty space between the words:
wordID word spacePos
----------- ------------------------------ -----------
1 James 6
2 Joyce 11
3 (1882-1941) 23
4 was 27
. . . . . . . . . . . . . . . . . . . . . . . . . . .
20 of 117
21 the 121
22 20th 126
23 century. 135
Queries 1 and 2 are self-explanatory. Query 3 is much more interesting, though it uses a pretty well known technique with self-joins. The aggregate function SUM(LEN(t1.word)+1) accumulates the number of letters (including the spaces) in the words preceding the current one. Using COUNT() instead of SUM(), you could get the number of words preceding the current one--in other words, the order number of each word in the text, which already is represented by the identity column wordID.
Now, let's proceed and find:
- The word "and" and words that start with a letter "w"
- The position of each phrase that starts with a letter "w"
Listing 5 is the solution to accomplish this.
Listing 5. Processing Converted Text, Using SQL (Part 2)
-- 1) Find words "and" and words, starting with a letter "w".
SELECT wordID, word
FROM dynWords
WHERE word LIKE 'w%'OR word LIKE 'AND';
Result:
wordID word
----------- --------------
4 was
7 writer
8 and
10 and
12 widely
19 writers
-- 2) The position of each phrase, starting with a letter "w".
SET QUOTED_IDENTIFIER OFF
GO
SET NOCOUNT ON
DECLARE @str varchar(8000)
SELECT @str = "James Joyce (1882-1941) was an Irish writer and poet, and is
widely considered one of the most significant writers of the 20th century."
SELECT @str = REPLACE(@str, CHAR(13),' ')
SELECT @str = REPLACE(@str, CHAR(10), '')
IF EXISTS(SELECT name FROM sysobjects WHERE name = 'dynWords' AND type = 'U')
DROP TABLE dynWords
CREATE TABLE dynWords(wordID int identity(1,1) not null,
word varchar(100) not null)
SELECT @str = 'INSERT INTO dynWords(word) SELECT A="' +
REPLACE(@str, ' w', '"UNION ALL SELECT"w') + '"';
EXECUTE(@str)
PRINT 'The position of each phrase starting with "w":'
SELECT tbl2.*, tbl1.c1 as startPos FROM
(SELECT (SUM(LEN(t1.word)+1)+1) as c1, c2 = t2.wordID
FROM dynWords AS t1 INNER JOIN dynWords AS t2
ON t1.wordID < t2.wordID
GROUP BY t2.wordID) tbl1
INNER JOIN (SELECT * FROM dynWords) tbl2
ON tbl1.c2 = tbl2.wordID
Result:
The position of each phrase starting with "w":
wordID word startPos
----------- ---------------------------------------------- ---------
2 was an Irish 24
3 writer and poet, and is 37
4 widely considered one of the most significant 61
5 writers of the 20th century. 107
SQL Server Speed Gap: 2005 vs. 2000
Text conversion and processing techniques are very powerful and flexible; you can use them in many projects. The techniques with dynamic SQL are more restrictive (a 8,000-character limit for the varchar data type or 4,000 for nvarchar) than the recursive techniques, which are faster, but you can use them only in SS2005.