early all applications require some sort of data integration?at minimum, retrieving some data and displaying it in the user interface. Usually, an application uses a combination of structured and unstructured data, which introduces a number of challenges. Often, you have to create, update, delete, and read these disparate data types within a single transaction?and that’s difficult when the structured data resides in a relational database but the unstructured data resides in a file system. SQL Server 2008’s new FILESTREAM feature helps solve this disjunct data problem by letting you store unstructured data in the file system while still preserving the transactional integrity of that data. This article explores the FILESTREAM feature and showcases how you can leverage it to gain more control over your own unstructured data.
Unstructured Data Options
With SQL Server 2005, when building an application that relies on both structured (relational) data and unstructured (non-relational) data, you had a couple of options:
- Store the relational data in the database and the non-relational data in a dedicated storage such as file systems and file servers. Although this approach can be cost-effective, it introduces additional complexity, because you need to manage the transactional integrity across both relational and non-relational systems.
- Store both relational and non-relational data in the database. For many years, databases have supported storage of non-relational data as Binary Large Objects, or BLOBs. SQL Server calls this data type a varbinary column. Although storing such data in the database is convenient, it typically comes at a higher cost (in disk space and storage/retrieval time) and can negatively impact the overall performance of the application.
However, with SQL Server 2008, the new FILESTREAM attribute combines with a varbinary column, so you can store the actual data on the server’s file system while managing and accessing it within the context of the database. The feature also gives SQL Server the capability of maintaining the integrity not only between records in the database but also between database records and external files associated with those records. Because this feature is implemented as just a new attribute of the existing varbinary(max) data type, developers can easily take advantage of FILESTREAM without having to completely re-architect their applications.
When to Use FILESTREAM
You should consider using FILESTREAM when any of the following conditions are true:
- When you are storing BLOBs with an average size of 1 MB or more
- When you need fast, read-only access to the data from your application
- When you want to access the BLOBs directly from your application’s middle tier code
- When you need to store the non-relational data and relational data in the database in the context of a single database transaction
By default, the FILESTREAM feature is disabled, so before you can use it, you must configure both the server and database instances using the following steps:
|Figure 1. Enabling FILESTREAM: You must enable FILESTREAM access for the desired SQL Server Instance before configuring the file stream access level for the database instance.|
- To enable FILESTREAM for the server instance, open up the SQL Server Configuration Manager and right-click on “SQL Server Services” from the list of services, then click Open. You’ll see a list of servers; right-click the SQL Server instance on which you want to enable FILESTREAM and select “Properties” from the context menu. Navigate to the “FILESTREAM” tab and check the “Enable FILESTREAM for Transact-SQL access” option (see Figure 1). You can also enable FILESTREAM for file I/O streaming access from this tab.
- To enable FILESTREAM for the database instance, execute the system stored procedure sp_configure and set the filestream_access_level parameter to 2, as shown below:
EXEC sp_configure filestream_access_level, 2 GO RECONFIGURE GO
The valid values for the filestream_access_level parameter are:
- 0?Disables FILESTREAM support for this instance. This is the default value.
- 1?Enables FILESTREAM for Transact-SQL access.
- 2?Enables FILESTREAM for Transact-SQL and Win32 streaming access.
After completing the server and database instance configurations, the next step is to create the actual database in which the data will reside. Because FILESTREAMs are specifically created for storing binary data in file systems, you use the CREATE DATABASE statement to create a special FILEGROUP and mark it as a stream:
CREATE DATABASE FILESTREAMExample ON PRIMARY ( NAME = FILESTREAMExample_Primary, FILENAME = 'c:ProjectsDevXDataFILESTREAMExample.mdf'), FILEGROUP FILESTREAMGroup CONTAINS FILESTREAM ( NAME = FILESTREAMExample_FileGroup, FILENAME = 'c:ProjectsDevXDataFILESTREAMExample') LOG ON ( NAME = FILESTREAMExample_Log, FILENAME = 'c:ProjectsDevXDataFILESTREAMExample.ldf') GO
Next, create a table, designating one of its columns as type VARBINARY(MAX) FILESTREAM:
CREATE TABLE Product ( ProductID INT NOT NULL PRIMARY KEY, Name VARCHAR(50) NOT NULL, Picture VARBINARY(MAX) FILESTREAM NULL, RowGuid UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL UNIQUE DEFAULT NEWID() ) GO
The preceding table definition assigns the Picture column as the varbinary(max) column, with the FILESTREAM attribute enabled. Note that the table with a FILESTREAM column must have at least one non-null unique ROWGUID column.
Any binary data you store inside the Picture column is not accessible from the file system; the only way to work with this binary data is through the standard CRUD (INSERT, UPDATE, and DELETE) SQL statements. The following example inserts a single row into the Product table.
INSERT INTO Product VALUES(1, 'Bicycle', 0x00, default)
That inserts a new row where the ProductID is 1, the Name contains Bicycle, the Picture column value is NULL, and the RowGuid column contains some default GUID value. Now you can retrieve this row from a .NET application, and read, overwrite, or append to its contents.
Using FILESTREAM to Write Data
For this example, consider a simple scenario that takes some user input, converts that into a byte array, and stores the information in the Picture column of the Product table. To follow along, create a Visual C# Windows Application named FileStreamExample. To the new project’s default form, add a Button named btnWriteFile, a TextBox named txtInput and a ListBox named lstResults. After that, double-click on the button to create a Click event handler containing the code shown in Listing 1, which begins by retrieving the connection string from app.config using the ConfigurationManager.ConnectionStrings property:
string connectionString = ConfigurationManager.ConnectionStrings ["fileStreamDB"].ConnectionString;
The connection string is stored in the app.config file as follows:
Next it opens a connection to the database, and assigns appropriate values to the SqlCommand object to retrieve the Products table row where ProductID is 1:
command.Connection = connection; //Get the PathName of the File from the database command.CommandText = "SELECT Picture.PathName(), " + "GET_FILESTREAM_TRANSACTION_CONTEXT() FROM Product " + "WHERE ProductID = 1";
The SQL statement specifies the new function GET_FILESTREAM_TRANSACTION_CONTEXT () that retrieves a reference to the transaction in which the session is currently running. You can bind the FILESTREAM file system operations to that transaction; however, the transaction must have already been started, and must not yet be aborted or committed. The function returns NULL when no explicitly-started transaction is available.
So, the code creates a new SqlTransaction object by calling the SqlConnection.BeginTransaction() method and assigning it to the SqlCommand object:
SqlTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadCommitted); command.Transaction = transaction;
At this point, Listing 1 invokes the ExecuteReader() method, executing the SQL statement. After executing the query, it retrieves the path of the file stream and assigns it to a local variable:
string path = reader.GetString(0);
You need a stream to write to a file, so the next step is to create an instance of the SqlFileStream class, supplying the path, transaction context, file access enumeration, file options enumeration, and allocation size:
SqlFileStream stream = new SqlFileStream(path, (byte)reader.GetValue(1), FileAccess.Write, FileOptions.SequentialScan, 0);
The SqlFileStream class is a new class (introduced with SQL Server 2008) that provides access to FILESTREAM column data as a sequence of bytes. Table 1 provides a brief description of the properties exposed by the SqlFileStream class.
|Name||Returns the logical path of the SqlFileStream object that is supplied to the constructor.|
|TransactionContext||Allows you to specify the transaction context associated with the SqlFileStream object.|
|Length||Returns the length of the current stream in bytes.|
|Position||Allows you to specify the position within the current stream.|
|ReadTimeout||Specified in milliseconds, this property determines the timeout period for the stream at the time of reading.|
|WriteTimeout||Specified in milliseconds, this property determines the timeout period for the stream at the time of writing.|
Next, Listing 1 retrieves the user input, converts that to a byte array, and writes that to the file stream.
string contents = txtInput.Text; stream.Write((System.Text.Encoding.ASCII.GetBytes(contents)), 0, contents.Length); stream.Close();
Finally Listing 1 commits the transaction by calling the SqlTransaction.Commit() method.
That’s the basic process for writing to a database-managed file using the FILESTREAM feature. Now that you know how to write to a FILESTREAM column, reading from the column will be simpler.
Using FILESTREAM to Read Data
To the C# project’s default form, add a Button named btnReadFile, and paste the code in Listing 2 into its Click event:
For brevity, I’ll only discuss the code that differs from the previous section. When you create the SqlFileStream, you need to indicate that you are opening the file stream for read-only access with SequentialScan enabled.
SqlFileStream stream = new SqlFileStream(path, (byte)reader.GetValue(1),FileAccess.Read, FileOptions.SequentialScan, 0);
After that, read the contents of the file stream into a byte array, convert that into a string, and display it in the ListBox.
int length = (int) stream.Length; byte contents = new byte[length]; stream.Read(contents,0,length); string results = System.Text.Encoding.ASCII.GetString (contents); lstResults.Items.Add(results);
|Figure 2. Sample Project: At this point, you can read and write user-entered data by setting and reading the contents of the FILESTREAM column using the SqlFileStream Class.|
When you run the application, you’ll see a screen similar to that shown in Figure 2 (you’ll see more about the Append File button in the next section). When you click the “Write File” button, the application writes the contents of the TextBox to the file stream. When you click the “Read File” button, the application reads the contents from the file stream and displays that in the ListBox.
So far, you have seen how to read and write to a file stream. The next example shows the code required to append to an existing file stream in the database.
Using FILESTREAM to Append Data
Add a command button named btnAppendFile and modify its Click event handler as shown in Listing 3:
This time, when you instantiate the SqlFileStream object, you set the file access enumeration to ReadWrite, so you can both read from and write to the file stream.
SqlFileStream stream = new SqlFileStream(path, (byte)reader.GetValue(1), FileAccess.ReadWrite, FileOptions.SequentialScan, 0);
You then move the file stream pointer to the end of the file so that you can append to it.
Then, write the user input to the file stream using the SqlFileStream.Write() method.
stream.Write((System.Text.Encoding.ASCII.GetBytes (contents)), 0, contents.Length);
Finally, commit the transaction by calling the SqlTransaction.Commit() method.
Advantages of FILESTREAM
That’s the entire process. You can now read, write, and append data to a database-managed file. Although the process may be slightly more complex than using files or storing data directly in BLOBs, you’ll find that letting the database manage file storage has several advantages:
- You can access both relational and non-relational data using a single data store.
- SQL Server automatically includes the non-relational data (BLOB) during database backups and restores.
- There is no file size limitation. The 2GB maximum size limitation for a varbinary(max) column doesn’t apply to FILESTREAM; you are limited only by the available space on the NTFS file system.
- You can insert, update, and delete both relational and non-relational data within a single transaction.
- Using FILESTREAM is very efficient, because SQL Server doesn’t use buffer pool memory to operate on non-relational (BLOBs) data (as is the case with traditional BLOB columns).
- You can access non-relational data directly from within middle tier .NET code using ADO.NET without having to resort to a separate API.
- Depending on the size of the file, the NTFS file system can save and retrieve large BLOBs faster than SQL Server.
The examples in this article showed how to implement the new FILESTREAM feature. As you can see, FILESTREAM provides an easy-to-use transaction programming model when you want to store both relational and non-relational data in a single transaction.