Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Powerful, Flexible Text-Formatting Solutions in SQL Server : Page 2

You can build very powerful and flexible text-formatting solutions in SQL Server. Learn how to apply techniques that handle the simplest to the more complex text-formatting tasks.


advertisement

Simple Text Formatting

Now suppose you have text that you want to format in such a way that each row will consist of exactly nine words. (I chose nine, but a row can contain any reasonable number of words.) You'd just need to split the text and load it into the table, and then remove all the extra empty spaces using the technique in Listing 1. Finally, you would restore the original text, counting the words. Listing 2 shows the solution.

Listing 2. Text Formatting by the Number of Words in a Row

SET QUOTED_IDENTIFIER OFF GO SET NOCOUNT ON DECLARE @str varchar(2000) SELECT @str = "Benjamin Franklin (January 17, 1706–April 17, 1790) was one of the most prominent of Founders and early political figures and statesmen of the United States. Considered the earliest of the Founders, Franklin was noted for his ingenuity, diversity of interests, and wit, and was extraordinarily influential in the development of the American Revolution despite never holding national elective office. Born in Boston, Massachusetts, to a tallow-maker, Franklin became a newspaper editor, printer, and merchant in Philadelphia, Pennsylvania, becoming very wealthy. He spent many years in England and published the famous Poor Richard's Almanack and Pennsylvania Gazette. He formed both the first public lending library and volunteer fire department in America as well as the Junto, a political discussion club."; -- preparation before splitting SELECT @str = REPLACE(@str, CHAR(13), ' ' ); SELECT @str = REPLACE(@str, CHAR(10), ' '); -- splitting the text using recursion technique WITH ProcessText AS (SELECT 1 as startPos, CHARINDEX (' ',@str, 1) as spacePos UNION ALL SELECT spacePos + 1, CHARINDEX (' ',@str, spacePos + 1) FROM ProcessText WHERE spacePos > 0) SELECT startPos, spacePos, SUBSTRING(@str, startPos, spacePos - startPos) AS singleWord INTO #words FROM ProcessText WHERE spacePos <> 0 UNION ALL SELECT MAX(spacePos) + 1, 0, RIGHT(@str, LEN(@str) - MAX(spacePos)) FROM ProcessText OPTION(MAXRECURSION 0); -- deleting extra spaces (rows with empty string in singleWord column DELETE #words WHERE singleWord = ''; -- restoring and formatting the original text DECLARE @str2 varchar(8000), @formatter int, @cnt int SELECT @str2 = '', @formatter = 9, @cnt = 0; SELECT @cnt = @cnt + 1, @str2 = CASE WHEN @cnt % @formatter = 0 THEN @str2 + singleWord + char(10) ELSE @str2 + singleWord + ' ' END FROM #words SELECT @str2 DROP TABLE #words Results: ------------------------------------------------------------------------------ Benjamin Franklin (January 17, 1706–April 17, 1790) was one of the most prominent of Founders and early political figures and statesmen of the United States. Considered the earliest of the Founders, Franklin was noted for his ingenuity, diversity of interests, and wit, and was extraordinarily influential in the development of the American Revolution despite never holding national elective office. Born in Boston, Massachusetts, to a tallow-maker, Franklin became a newspaper editor, printer, and merchant in Philadelphia, Pennsylvania, becoming very wealthy. He spent many years in England and published the famous Poor Richard's Almanac and Pennsylvania Gazette. He formed both the first public lending library and volunteer fire department in America as well as the Junto, a political discussion club.

I used the recursive technique to convert and load text into the table. ("Fast Text Processing in SQL Server" explains that technique in detail.)



You also can experiment with the formatter in Listing 2. For example, if you change the @formatter value from 9 to, say 5, you will get a result where each row of the text consists of exactly five words:

Results: ---------------------------------------------- Benjamin Franklin (January 17, 1706–April 17, 1790) was one of the most prominent of Founders and early political figures and statesmen of the United States. Considered the earliest of the Founders, Franklin was noted for his ingenuity, diversity of interests, . . . . . . . . . . . . . . . . . . . first public lending library and volunteer fire department in America as well as the Junto, a political discussion club.

Now suppose you don't want to format text by the number of the words. Instead, you want to have the rows to have a certain length (e.g., 50 characters in each row). This task is more complicated than the previous one, because you can't make rows of exactly 50 characters without breaking words. However, with some approximation, formatting by number of characters works (see Listing 3).

Listing 3. Text Formatting by the Number of Characters in a Row

SET QUOTED_IDENTIFIER OFF GO SET NOCOUNT ON DECLARE @str varchar(2000) SELECT @str = "Benjamin Franklin (January 17, 1706–April 17, 1790) was one of the most prominent of Founders and early political figures and statesmen of the United States. Considered the earliest of the Founders, Franklin was noted for his ingenuity, diversity of interests, and wit, and was extraordinarily influential in the development of the American Revolution despite never holding national elective office. Born in Boston, Massachusetts, to a tallow-maker, Franklin became a newspaper editor, printer, and merchant in Philadelphia, Pennsylvania, becoming very wealthy. He spent many years in England and published the famous Poor Richard's Almanack and Pennsylvania Gazette. He formed both the first public lending library and volunteer fire department in America as well as the Junto, a political discussion club."; SELECT @str = REPLACE(@str, CHAR(13), ' '); SELECT @str = REPLACE(@str, CHAR(10), ' '); WITH ProcessText AS (SELECT 1 as startPos, CHARINDEX (' ',@str, 1) as spacePos UNION ALL SELECT spacePos + 1, CHARINDEX (' ',@str, spacePos + 1) FROM ProcessText WHERE spacePos > 0) SELECT startPos, spacePos, SUBSTRING(@str, startPos, spacePos - startPos) AS singleWord INTO #words FROM ProcessText WHERE spacePos <> 0 UNION ALL SELECT MAX(spacePos) + 1, 0, RIGHT(@str, LEN(@str) - MAX(spacePos)) FROM ProcessText OPTION(MAXRECURSION 0); DELETE #words WHERE singleWord = ''; DECLARE @str2 varchar(8000), @formatter int, @cnt int SELECT @str2 = '', @formatter = 50, @cnt = 1; SELECT @cnt = CASE WHEN @cnt > = 0 AND @cnt < @formatter THEN @cnt + len(singleWord) + 2 ELSE 1 END, @str2 = CASE WHEN @cnt = 1 THEN @str2 + singleWord + char(10) ELSE @str2 + singleWord + ' ' END FROM #words SELECT @str2 DROP TABLE #words Results: ----------------------------------------------------------------- Benjamin Franklin (January 17, 1706–April 17, 1790) was one of the most prominent of Founders and early political figures and statesmen of the United States. Considered the earliest of the Founders, Franklin was noted for his ingenuity, diversity of interests, and wit, and was extraordinarily influential in the development of the American Revolution despite never holding national elective office. Born in Boston, Massachusetts, to a tallow-maker, Franklin became a newspaper editor, printer, and merchant in Philadelphia, Pennsylvania, becoming very wealthy. He spent many years in England and published the famous Poor Richard's Almanack and Pennsylvania Gazette. He formed both the first public lending library and volunteer fire department in America as well as the Junto, a political discussion club.

The code in Listing 3 is very similar to the solution from Listing 2. However, it is quite difficult to make the code work with adequate approximation. I had to tune the final query, adding an addend 2 to the very end of the next line:

WHEN @cnt > = 0 AND @cnt < @formatter THEN @cnt + len(singleWord) + 2

That adjustment allowed me to count the free spaces between the words and some special formatting characters (like line feed), when I calculated the row length.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap