Manage Database Metadata with POI and Excel

Manage Database Metadata with POI and Excel

atabases are everywhere. In IT, you can’t build a substantial application without providing tables to store the data that the application will rely on (properties, configuration settings, etc.) and process. Maintaining a clear understanding of the role the columns play in your tables is essential, as maintenance needs and enhancements force tables to evolve.

Entity relationship diagrams (ERD), while central to such development efforts, don’t go far enough when it comes to explaining the actual use of a particular column. Some ERD design tools do allow you to store metadata in them, but this forces anybody who wants to look up metadata definitions later to either use those specific tools. Moreover, descriptions stored in design tools, while adequate for those involved in the design process, may not meet the needs of developers performing maintenance or creating enhancements after the application goes live. Consequently, the usage and meanings of columns and tables typically get modified, lost, or obscured over time.

A simple metadata dictionary can help you monitor and maintain this essential information. Just use the Apache POI library API to query your database for its metadata and then store it in a clear and organized way in Microsoft Excel. This article describes how to implement this solution. (See Sidebar: What Does POI Stand For? for a humorous explanation of the technology.)

The Apache POI overview page describes the project this way:

“The POI project is the master project for developing pure Java ports of file formats based on Microsoft’s OLE 2 Compound Document Format. OLE 2 Compound Document Format is used by Microsoft Office Documents, as well as by programs using MFC property sets to serialize their document objects.”

Figure 1. POI-Generated Excel Output File

Practically speaking, this means that the POI API enables you to write Java code that interacts with Microsoft Office applications without having to know anything about the underlying C++ Microsoft Foundation Classes. Figure 1 shows an Excel output file containing some of the metadata that you can pull out of your database tables using POI.

A closer look at POI and its code explains how it generated the Excel output. (Click here to download the code.) The data dictionary utility first connects to your database to collect the metadata from the tables. (To connect to any database using Java you need your database vendor’s JDBC drivers. The code in this article uses Oracle, so if you try it on your computer, you need to have Oracle’s classes12.jar in your classpath.) Lines 46-55 in the constructor make the connection to the database and set up the Excel output file:

       System.out.print("  Loading JDBC Driver  -> " + driverClass + "
");         Class.forName(driverClass).newInstance();         System.out.print("  Connecting to        -> " + connectionURL + "
");         this.con = DriverManager.getConnection(connectionURL, userID, userPassword);         System.out.print("  Connected as         -> " + userID + "
");                 wb = new HSSFWorkbook();         f = new File("c:\poidatadict.xls");

To access the POI API you must have the POI jar file (see resources for download location) in your classpath. In the snippet above, the HSSFWorkbook variable wb is the top-level object of the Excel object hierarchy. As each table’s metadata is accessed, the utility creates a new worksheet in the workbook object, and as each of the table’s columns is processed, it adds a new row with three columns to hold the metadata (line 102):

        String[] types = {"TABLE"};        resultSet = md.getTables(null, null, "%", types);        // Get the table names        while ( {            // Get the table name            String tableName = resultSet.getString(3);            // Get the table's catalog and schema names (if any)            String tableCatalog = resultSet.getString(1);            String tableSchema = resultSet.getString(2);            if(tableSchema.equals("TESTSCHEMA")) {            	int rowNum = 1;            	System.out.println("table name: " + tableName);            	sheet = wb.createSheet(tableName);            	ResultSet rset = md.getColumns(null, null, tableName.toUpperCase(), "%");            	HSSFRow row = sheet.createRow(rowNum);            	row.createCell(ZERO).setCellValue("Column Name");            	row.createCell(ONE).setCellValue("Data Type");            	row.createCell(TWO).setCellValue("Description");            	            	            	while( {            		rowNum++;            		HSSFRow dataRow = sheet.createRow(rowNum);            		dataRow.createCell(ZERO).setCellValue(rset.getString(4));				dataRow.createCell(ONE).setCellValue(codeToText(rset.getInt(5)));            		//System.out.println("name: " + rset.getString(4) + "   type: " + 
rset.getString(5)); } rset.close(); }

Since a database could contain multiple schemas, the code first sets up a filter to process only the tables in a particular schema (line 89). For each database table, it creates a new worksheet in the workbook. By passing the tableName variable to the createSheet( ) method, the code automatically sets up the worksheet with the name of the table appearing in the worksheet tab.

This example registers the metadata values for only the column’s name and its data type, but you can extract any metadata that interests you as long as your database vendor’s JDBC spec implementation supports it (e.g., column size, number of decimal places, whether the column is nullable, whether some examples of other metadata are available using Oracle). Additionally, the utility creates an empty description field so you can add plain language explanations of the column’s usage afterwards. Lines 96-98 set up headings for these three columns. Since the utility uses zero-based column numbering when a new cell is created in a row, I employ static constants that are named to match the column (ZERO, ONE, etc.).

The while loop (lines 100-106) evaluates each column in the table and adds its metadata to the cells of the worksheet. The private codeToText( ) method converts Oracle data type codes to plain text as each column’s metadata is evaluated. Lines 109-115 write out the data in the workbook object to the physical file on the filesystem and catch any exceptions:

		try {            	fos = new FileOutputStream(f);            	wb.write(fos);			fos.close();    		}catch(IOException ioe ){    			ioe.printStackTrace();         }

As you can see, the POI API mirrors the way Excel itself works. The objects created are simply abstractions of the elements that make up the Excel application’s primary parts: workbook, worksheet, rows, and cells.

With this utility, you can easily create a database metadata container that anybody in your organization can access. While a certain amount of discipline is required to keep these worksheets up to date, the utility will allow you to begin improving your database housekeeping.

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of