Image of Navigational Map linked to Home / Contents / Search Doing the Split

by Ross Mack - GUI Computing
Image of Line Break

'Listen to the outworlders, Caleb. They may seem strange and their tongue is foreign but there are things they know which we do not. Do their craft not run before the wind like stallions? There are things they can teach us just as there are many things we can teach them.'

- Aluröch the Seer

One of the things about working with languages other than the one you usually work with (your native tongue ?) is that you discover things that your normal language does not do, or does not do so well. You discover features that you wish you still had when you get back to the language you usually use. I recently experienced this when I spent quite a while writing code in Perl (yes, Perl) on Unix boxes instead of my usual VB. Perl is a very cool language and has quite a few features that VB does not. However, some of them would be quite inappropriate. But some would be very handy.

For example, Perl has some very strong Regular Expression handling. You can easily parse whole files using regular expressions very easily. However, that's not what I'm here to talk about on this occasion. Perhaps when I have written a regular expression library for VB I'll come back to that.

A couple of the functions I found I could no longer live without when I got back to VB were split() and join(). The purpose of split() is essentially to take a string and break it up into an array of strings based on a specified delimiter. Let's take the example of the following string:

Using the split function you could pass in this string and specify ":" as the delimiter and split() would return an array of five items as follows:

I think that's cool.

So, what does the join() function do? Just the opposite of split(), join() takes an array of items and joins them using the specified delimiter into a single string. For example you could take the array of five items returned by split() and pass them into join() specifying "#" as the delimiter and that would result in the following string:

I think that's cool too.

Note to Perl enthusiasts (and I think you know who you are) : Yes, I know I'm not using the correct Perl nomenclature. That's intentional. Get over it.

Unfortunately, Perl has a different variable reference structure to VB so the functions I have built to emulate these functions in VB are not quite the same. Also, they don't support some of the weird and wonderful things you can do in Perl using regular expressions and the such, but they cover most cases you will run into with VB. The code is vanilla VB and I have used it in VB3, VB4 and VB5 with only minor modifications if any. As you can see it's not overly complicated, but I have discovered that since I wrote it and keep a copy with me I find places to use it every where.

Function Join(sArray() As String, ByVal sDelim_IN As String) As String
   Dim sTemp As String
   Dim iStart As Integer
   Dim iEnd As Integer
   Dim iNum As Integer
   ' Initialise the variables we need 
   sTemp = ""
   iStart = LBound(sArray)
   iEnd = UBound(sArray)
   ' Loop through the array of items passed  
   For iNum = iStart To iEnd
      sTemp = sTemp & sArray(iNum)
       ' Unless it's the last item add the delimiter
      If iNum < iEnd Then
         sTemp = sTemp & sDelim_IN
      End If
   Next iNum
   ' return the concatenated string
   Join = sTemp
End Function

Function Split(ByVal sSource_IN As String, ByVal sDelim_IN As String, sArray() As String) As Integer
   Dim nDelimLen As Integer
   Dim nFieldCount As Integer
   Dim sCurrent As String
   Dim nPos As Integer

   ' Split's the passed string into a number of elements
   ' separated by the indicated delimiter. These fields
   ' are then written into the array passed, which is 1
   ' based. The number of fields found is returned. 
   Erase sArray
   nDelimLen = Len(sDelim_IN)
   sCurrent = sSource_IN
      ' look for the specified delimiter
      nPos = InStr(1, sCurrent, sDelim_IN)
      If nPos > 0 Then
         nFieldCount = nFieldCount + 1
         ReDim Preserve sArray(1 To nFieldCount)
         ' grab the string left of the delimiter and add
        ' to the array 
         sArray(nFieldCount) = Left$(sCurrent, nPos - 1)
         ' cut the used portion off the string  
         sCurrent = Mid$(sCurrent, nPos + nDelimLen)
      End If
   Loop Until nPos = 0
   If sCurrent <> "" Then
      ' Deal with the last item without a trailing delimiter
      nFieldCount = nFieldCount + 1
      ReDim Preserve sArray(1 To nFieldCount) As String
      sArray(nFieldCount) = sCurrent
   End If
   ' return the number of items in the array
   Split = nFieldCount
End Function

You will notice that Split uses Redim Preserve fairly abundantly. I usually try to avoid this, as it is a slow operation. However, the only way to avoid that would be to do virtually all the rest of the processing in Split twice so as to determine before dimensioning the array how many elements it should have. I have included a version of split() in the downloadable zip file that functions this way. I found it to be faster with large numbers of elements, but the element size was also a factor. Either way VB5 on a Pentium 166 can split a string into an 80 element array 25000 times in just over a minute. That seems fast enough.

Note: Split always re-dimensions the string array you pass it and kills any existing values in it. When it gets passed back it will always be a 1 based array with a number of elements specified by the integer returned by Split.

So where can you use split and join? There are multitudes of places.

Here is a simple one. Let's say you want to quickly get the filename portion from a full path and filename:

Function GetFileName(ByVal sIN as String) As String
   Dim nCount As Integer
   Dim sPath() As String

   ' Split the full path based on slashes. 
   nCount = Split(sIN, "\", sPath())
   ' return whatever cam after the last slash.
   GetFileName = sPath(nCount)
End Function 

Or, for you C programmers out there, a more terse version (you can thank David Thompson for this one):

Function GetFileName(ByVal sIN as String) As String
   Dim sPath() As String

   GetFileName = sPath(Split(sIN, "\", sPath()))
End Function

Another example; if you are importing a delimited text file you can use split to simply break up each line of the file into an array of values. The code may look like this:

Dim sLine as string
Dim nFieldCount as Integer
Dim sValues() as String
Dim hFile as integer

hFile = Freefile 
Open sFilename for input access read as hFile
Do While not eof(hFile)
   ' Read a line from the file
   Line Input #hFile, sLine
   ' Split the line on tabs
   nFieldCount = Split(sLine,Chr$(9),sValues)
   ' Check we have the right number of values
   if nFieldCount = gnDesiredFieldCount then
      ' We have the right number  
      sSQL = "INSERT INTO mytable VALUES ("
      sSQL = "'" & sValues(1) & "',>'" & sValues(2) & "')"
      ' Wrong number of fields  
      sSQL = "INSERT INTO myerrorstable VALUES ("
      sSQL = sSQL & "'" & sValues(1) & "')"
   end if
   db.Execute sSQL ' or whatever   
Close #hFile

As you can see, the import process becomes quite simple. In addition, you will notice that in the example I use the Chr$ function to pass a tab character into split as the delimiter. Split and join work fine with any sort of delimiter. You can freely use all manner of non-printing characters including Chr$(0). You can also use delimiters of any length as the delimiter length is accounted for in both functions. So, for example you could use "{I am a delimiter}" as your delimiter, although it incurs something of a size overhead.

Here is a classic situation, to read all the Setting names from a Section of an INI file you would use the following code:

Declare Function GetPrivateProfileString Lib "Kernel" (ByVal lpApplicationName As String, lpKeyName As Any, ByVal lpDefault As String, ByVal lpReturnedString As String, ByVal nSize As Integer, ByVal lpFileName As String) As Integer

Dim sBuff As String * 500
Dim nResult As Integer
Dim sSettings() As String

nResult = GetPrivateProfileString("MySection", 0&, "", sBuff, 500, "whatever.ini")
If nResult > 0 then ' We read something 
    sBuff = Left$(sBuff, nResult)
    ' Split the setting names out into an array of setting names. 
    nResult = Split(sBuff, Chr$(0), sSettings())
End If  

If the section is found in the INI file the names of all the settings in that section are returned into the provided string buffer delimited by ASCII character 0. I used to avoid this call, as it was always a pain to then extract all the setting names from the string, but not when you can use split. Just like a late night infomercial isn't it?

Of course, the amount of data these functions can deal with is limited by the effective size of VB strings. This means they have an absolute upper limit of just less than 64k, but you will run out of 'String Space' somewhere before that, depending on which version of VB you are using.

You can also store data in a hierarchical form by using different delimiters at different levels of the hierarchy and split them iteratively. However, you are probably better off using some other structure to do that unless you really have to represent all that data in a single string (See Jim Karabatsos' article on the iStreamable interface for a potential circumstance).

So, there it is, a useful snippet of code and a few ideas on where it can be used. But remember to use it wisely. The downloadable ZIP file includes a BAS module with Split for VB3 and the equivalent for VB5 (VB4 users should use one or the other as appropriate). It also includes a simple demonstration app in VB3 and in VB5.

Written by: Ross Mack
July '97

Image of Arrow linked to Previous Article Image of Arrow linked to Next Article
Image of Line Break