devxlogo

Use OLE Structured Storage to Get a List of Worksheet Names

Use OLE Structured Storage to Get a List of Worksheet Names

ActiveX Data Objects (ADO) is a proven technology to access various data sources on computers that run Microsoft Windows. It provides an abstract interface to data stored in a wide range of file formats such as Microsoft Excel; it also specifies methods to obtain metadata, such as schema information, of these files. ADO uses OLE DB Providers for data access and is implemented as a COM object.

But ADO also has its limitations. For example, the method OpenSchema(), when used with adSchemaTables query type to get a list of all tables (Excel “tables” are worksheets and named ranges), always returns an alphabetically sorted list of names and this sorting cannot be changed. There is no way in ADO to get the list of worksheets in an Excel workbook in the order the user sees them when Microsoft Excel opens this workbook.

The code fragment below provides a workaround for this limitation. It accesses an Excel workbook as a structured storage object. Structured Storage is a COM technology that allows a single file in a native file system to be used to store multiple OLE streams of data. The line numbers in the code below are for reference purposes only:

01    IStorage *pStorage = 0;02    HRESULT hResult = ::StgOpenStorage(L"C:\MyExcel.xls", NULL, STGM_READ |STGM_SHARE_DENY_WRITE, NULL, 0, &pStorage);03    IStream *pStream = 0;04    hResult = pStorage->OpenStream(L"Workbook", NULL, STGM_READ |STGM_SHARE_EXCLUSIVE, 0, &pStream);05    char buf[32000]; DWORD dwRead = 0; short id = 0, len = 0;06    while (1) {07        hResult = pStream->Read(&id, 2, &dwRead);08        if( (FAILED(hResult)) || (dwRead != 2) ) break;09        hResult = pStream->Read(&len, 2, &dwRead);10        if( (FAILED(hResult)) || dwRead != 2 || ((id == 0) && (len == 0)) ) break;11        if( (id & 0x00FF) == 0x85 ) {12            pStream->Read(buf, len, &dwRead);13            buf[len] = 0;14            cout << buf+8 << endl;15            continue;16        }17        if(len > sizeof(buf)) len = sizeof(buf);18        pStream->Read(buf, (long)len, &dwRead);19        if((DWORD)len != dwRead) break;20    }

The structure of this code fragment is similar to other programs that open Excel files as structured storage files. For example, the one that demonstrates how to determine the version of Microsoft Excel that wrote a given workbook?the first two lines open a structured storage file. In this code fragment, the file name, C:MyExcel.xls, is hard-coded. In a real application, the file name most likely would be passed as a const WCHAR pointer, possibly converted from a character string by using either the standard library function mbstowcs() or the Windows API function MultiByteToWideChar(). It is also recommended that the applications use StgIsStorageFile() function to check whether the file to be used is indeed a structured storage file. The call to StgOpenStorage() uses STGM_SHARE_DENY_WRITE mode to prevent OLE from making a snapshot copy of the file. You can also use the StgOpenStorageEx() function, which is an enhanced version of StgOpenStorage().

The lines 03-04 open the “Workbook” stream object within the Excel workbook file. For Excel versions prior to Excel 97, you have to use the stream name “Book” instead.

After opening a stream, the rest of the code (lines 05-20) reads all of the records in it, looking for BOUNDSHEET (also known as BUNDLESHEET) records that represent worksheets. The code on line 11 checks for the BOUNDSHEET record id, 0x85, and, if such a record is found, prints the worksheet name which starts at the offset eight of the beginning of the BOUNDSHEET record (for more information about Excel “BIFF” file format, you can read OpenOffice.org’s document here).

The code above can be easily modified to perform other functions, for example, to find the sheet’s name by its number. Some error handling was omitted for clarity and it is always a good idea to release the resources that are no longer needed, so the Release() method (inherited from the IUnknown interface) should be called for pStream and then for pStorage after you finish working with them.

Finally, this code will not work with files stored in the new Excel 2007 format, XLSX, introduced in Microsoft Vista. These files use an Open XML file format, which is a ZIP-compressed hierarchical structure of files. In XLSX files, worksheets are located in the “xlworksheets” folder inside ZIP archive.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist