The following routine extracts all the words from a source string and returns a collection. Optionally, the result contains only unique words.
This code is remarkably simpler than an equivalent “pure” VB solution because it takes advantage of the RegExp object in the Microsoft VBScript Regular Expression type library.
' Get a collection of all the words in a string' If the second argument is True, only unique words are returned''' NOTE: requires a reference to the' Microsoft VBScript Regular Expression type libraryFunction GetWords(ByVal Text As String, Optional DiscardDups As Boolean) As _ Collection Dim re As New RegExp Dim ma As Match ' the following pattern means that we're looking for a word character (w) ' repeated one or more times (the + suffix), and that occurs on a word ' boundary (leading and trailing sequences) re.Pattern = "w+" ' search for *all* occurrences re.Global = True ' initialize the result Set GetWords = New Collection ' we need to ignore errors, if duplicates are to be discarded On Error Resume Next ' the Execute method does the search and returns a MatchCollection object For Each ma In re.Execute(Text) If DiscardDups Then ' if duplicates are to be discarded, we just add a key to the ' collection item ' and the Add method will do the rest GetWords.Add ma.Value, ma.Value Else ' otherwise just add to the result GetWords.Add ma.Value End If Next End Function
Here is an example of how you can use the routine:
' Count how many articles appear in a source string' held in the txtSource textbox controlDim v As VariantDim count As LongFor Each v In GetWords(txtSource.Text) Select Case LCase$(v) Case "the", "a", "an" count = count + 1 End SelectNextMsgBox "Found " & count & " articles."