emoving extra empty spaces is a pretty common task that seems very simple, but in reality it is not as trivial as you may assume. The first impulsive solution that many programmers try is replacing two empty spaces with one, which works as long as the number of empty spaces doesn't exceed two. The problem starts when text has three, four, or more empty spaces between words. If you try repeatedly replacing every two empty spaces with one in those scenarios, you can completely remove some of the empty spaces between the words.
The conventional approach for such a task would be checking for empty spaces, character by character, in a loop and removing the extra spaces. However, there is another approach (see Listing 1).
Listing 1. Removing Extra Empty Spaces
SET QUOTED_IDENTIFIER OFF
SET NOCOUNT ON
DECLARE @str varchar(2000)
SELECT @str = " Bipartite graphs are useful for modelling matching problems."
CREATE TABLE #words(singleWord varchar(50))
SELECT @str = REPLACE(@str, CHAR(13), ' ');
SELECT @str = REPLACE(@str, CHAR(10), ' ');
-- Split by words and load text into the table
SELECT @str = 'INSERT INTO #words(singleWord) SELECT A="' +
REPLACE(@str, ' ', '"UNION ALL SELECT"') + '"'
-- Delete spaces between the words
SELECT * FROM #words;
DELETE #words WHERE singleWord = '';
SELECT * FROM #words;
-- Restore original text without extra empty spaces
SELECT @str = '';
SELECT @str = @str + singleWord + ' ' FROM #words;
DROP TABLE #words
Using any of the techniques that were explained in the article "Fast Text Processing in SQL Server" (Listing 1 uses the technique with dynamic SQL), you can convert and load text into a table. The empty strings will represent the empty spaces between the words:
(17 row(s) affected)
All you would need to do now is delete the rows from the table #words where a singleWord column has the empty string values. Then you'd need to restore the original text, using a simple word accumulator, built on a string variable
@str. The final result would be a normalized (without extra spaces) string: "Bipartite graphs are useful for modelling matching problems."
|Author's Note: I received a few emails from readers complaining that some scripts from my article "Fast Text Processing in SQL Server" are not working. As it turns out, they used case-sensitive server collations. Please, be aware that my solutions have been tested for SQL_Latin1_General_CP1_CI_AS case-insensitive collation only. For case-sensitive collations, adjust the solutions correspondingly.