Use OLE Structured Storage to Get a List of Worksheet Names

Use OLE Structured Storage to Get a List of Worksheet Names

ActiveX Data Objects (ADO) is a proven technology to access various data sources on computers that run Microsoft Windows. It provides an abstract interface to data stored in a wide range of file formats such as Microsoft Excel; it also specifies methods to obtain metadata, such as schema information, of these files. ADO uses OLE DB Providers for data access and is implemented as a COM object.

But ADO also has its limitations. For example, the method OpenSchema(), when used with adSchemaTables query type to get a list of all tables (Excel “tables” are worksheets and named ranges), always returns an alphabetically sorted list of names and this sorting cannot be changed. There is no way in ADO to get the list of worksheets in an Excel workbook in the order the user sees them when Microsoft Excel opens this workbook.

The code fragment below provides a workaround for this limitation. It accesses an Excel workbook as a structured storage object. Structured Storage is a COM technology that allows a single file in a native file system to be used to store multiple OLE streams of data. The line numbers in the code below are for reference purposes only:

01    IStorage *pStorage = 0;02    HRESULT hResult = ::StgOpenStorage(L"C:\MyExcel.xls", NULL, STGM_READ |STGM_SHARE_DENY_WRITE, NULL, 0, &pStorage);03    IStream *pStream = 0;04    hResult = pStorage->OpenStream(L"Workbook", NULL, STGM_READ |STGM_SHARE_EXCLUSIVE, 0, &pStream);05    char buf[32000]; DWORD dwRead = 0; short id = 0, len = 0;06    while (1) {07        hResult = pStream->Read(&id, 2, &dwRead);08        if( (FAILED(hResult)) || (dwRead != 2) ) break;09        hResult = pStream->Read(&len, 2, &dwRead);10        if( (FAILED(hResult)) || dwRead != 2 || ((id == 0) && (len == 0)) ) break;11        if( (id & 0x00FF) == 0x85 ) {12            pStream->Read(buf, len, &dwRead);13            buf[len] = 0;14            cout << buf+8 << endl;15            continue;16        }17        if(len > sizeof(buf)) len = sizeof(buf);18        pStream->Read(buf, (long)len, &dwRead);19        if((DWORD)len != dwRead) break;20    }

The structure of this code fragment is similar to other programs that open Excel files as structured storage files. For example, the one that demonstrates how to determine the version of Microsoft Excel that wrote a given workbook?the first two lines open a structured storage file. In this code fragment, the file name, C:MyExcel.xls, is hard-coded. In a real application, the file name most likely would be passed as a const WCHAR pointer, possibly converted from a character string by using either the standard library function mbstowcs() or the Windows API function MultiByteToWideChar(). It is also recommended that the applications use StgIsStorageFile() function to check whether the file to be used is indeed a structured storage file. The call to StgOpenStorage() uses STGM_SHARE_DENY_WRITE mode to prevent OLE from making a snapshot copy of the file. You can also use the StgOpenStorageEx() function, which is an enhanced version of StgOpenStorage().

The lines 03-04 open the “Workbook” stream object within the Excel workbook file. For Excel versions prior to Excel 97, you have to use the stream name “Book” instead.

After opening a stream, the rest of the code (lines 05-20) reads all of the records in it, looking for BOUNDSHEET (also known as BUNDLESHEET) records that represent worksheets. The code on line 11 checks for the BOUNDSHEET record id, 0x85, and, if such a record is found, prints the worksheet name which starts at the offset eight of the beginning of the BOUNDSHEET record (for more information about Excel “BIFF” file format, you can read OpenOffice.org’s document here).

The code above can be easily modified to perform other functions, for example, to find the sheet’s name by its number. Some error handling was omitted for clarity and it is always a good idea to release the resources that are no longer needed, so the Release() method (inherited from the IUnknown interface) should be called for pStream and then for pStorage after you finish working with them.

Finally, this code will not work with files stored in the new Excel 2007 format, XLSX, introduced in Microsoft Vista. These files use an Open XML file format, which is a ZIP-compressed hierarchical structure of files. In XLSX files, worksheets are located in the “xlworksheets” folder inside ZIP archive.

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing