Tracking and Resuming Large File Downloads in ASP.NET

erving files that clients can download over the Internet couldn’t be easier, right? Just copy the downloadable files into your Web application’s directory, distribute the links and let IIS do all the work. Then again, serving files couldn’t be more of a pain in the neck. You don’t want your data to be accessible to the whole world. You don’t want your server crowded with hundreds of static files. Maybe you even want to download temporary files?creating them on-the-fly only when the client starts the download.

Unfortunately, that’s not possible using IIS’s default response to a download request. So normally, to gain control over the download process, developers link to a custom .aspx page, where they can check credentials, create the downloadable file, and push that file back to the client using:

Response.WriteFileResponse.End()

And that’s where the real troubles begin.

What’s the Problem?
The WriteFile method seems perfect; given a file name, it streams the binary data for that file down to the client. Until recently though, the WriteFile method was a notorious memory hog; loading the entire file into your server’s RAM to serve it (actually, it can use up to twice the file’s size). For large files, this causes severe memory problems, and can recycle the ASP.NET process itself. But in June 2004, Microsoft solved that issue via a hotfix (see Knowledge Base Article 823409 ). This hotfix is now part of the .NET Framework’s 1.1 Service Pack 1.

Author’s Note: If you haven’t installed the .NET Framework version 1.1 Service Pack 1 (SP1), please do it now?SP1 provides numerous fixes and improvements.

Among other things, the hotfix introduced the TransmitFile method, which reads the disk file into a smaller memory buffer to transmit the file. Even though that solution solves the memory and recycling problems, it’s unsatisfying. You have no control over the life-cycle of the response. You can’t tell if the download completed properly, you have no way of knowing whether the download was interrupted, and (if you created a temporary file) you don’t know if or when you can delete the file. Even worse, if the download does fail, the TransmitFile method restarts it from the top upon the client’s next attempt.

One possible solution, implementing the Background Intelligent Transfer Service (BITS) is not an option for most sites, because that would ruin the attempt to maintain browser and OS independence on the client side.

The base for a satisfying solution came from Microsoft’s first attempt to solve the memory-cluttering problems that WriteFile caused (see Knowledge Base Article 812406). The workaround in that article shows a chunk-wise downloading process, which reads data from a file stream. Before the server sends each chunk of bytes to the client, it checks whether the client is still connected, using the Response.IsClientConnected property. If so, it continues streaming bytes; otherwise, it stops, preventing the server from delivering unnecessary data.

That’s the way to go, particularly when you’re downloading a temporary file. In the event that IsClientConnected returns False, you know that the download was interrupted and you must keep the file; whereas when the procedure completes successfully, you can delete it. In addition, to resume broken downloads, all you have to do is start streaming again from the point in the file where the client connection failed during the previous download attempt.

HTTP Protocol Header Support
It turns out that the HTTP protocol supports headers designed for use with interrupted downloads. Using a handful of HTTP headers, you can enhance your download procedure to comply fully with the HTTP Protocol Specification. The specifications work with ranges to provide everything you need to resume interrupted downloads.

Here’s how it works. First, a server sends the Accept-Ranges header in its initial response, if it supports letting the client resume downloads. The server also sends an entity tag header, ETag that contains a unique identification string.

The code below shows some of the headers that IIS sends back to the client in response to an initial download request, giving the client detailed information about the requested file.

   HTTP/1.1 200 OK   Connection: close   Date: Tue, 19 Oct 2004 15:11:23 GMT   Accept-Ranges: bytes   Last-Modified: Sun, 26 Sep 2004 15:52:45 GMT   ETag: "47febb2cfd76c41:2062"   Cache-Control: private   Content-Type: application/x-zip-compressed   Content-Length: 2844011

After receiving those headers, if the download is interrupted, Internet Explorer sends the ETag value back to the server with a subsequent download request, along with the Range header. The following code shows some of the headers that Internet Explorer sends to the server in an attempt to resume a broken download.

   GET http://192.168.100.100/download.zip HTTP/1.0   Range: bytes=822603-   Unless-Modified-Since: Sun, 26 Sep 2004 15:52:45 GMT   If-Range: "47febb2cfd76c41:2062"

These headers show that Internet Explorer caches the entity tag provided by IIS and sends it back to the server in the If-Range header, which is one way to make sure that the download resumes with the exact same file. Unfortunately, not all browsers work exactly the same way. Other HTTP headers that a client might send to verify the file are If-Match, If-Unmodified-Since or Unless-Modified-Since. Apparently, the specification isn’t perfectly clear on whether client software must support such headers, or which ones they must use. Therefore, some clients don’t use any at all, IE only uses If-Range and Unless-Modified-Since. It’s best to have your code check all of them. That way, your application can comply with HTTP at a very high level and work with multiple browsers. The Range header indicates the requested byte range?in this case the starting point from which the server should resume streaming the file.

When IIS receives a resume-download type request, it sends back a response that includes the headers shown below.

   HTTP/1.1 206 Partial Content   Content-Range: bytes 822603-2844010/2844011   Accept-Ranges: bytes   Last-Modified: Sun, 26 Sep 2004 15:52:45 GMT   ETag: "47febb2cfd76c41:2062"   Cache-Control: private   Content-Type: application/x-zip-compressed   Content-Length: 2021408

Note that the preceding code has a different HTTP response code than the original download request?206 for the resume-download request vs. 200 for the initial download request. This indicates that the content about to come through the line is a partial file. This time, the Content-Range header specifies the exact amount and position of the bytes delivered.

Internet Explorer is very picky about these headers. If the initial response doesn’t contain an ETag header, IE will never even try to resume a download. Other clients I tested didn’t use the ETag header, they simply relied on the file name, requesting ranges and using the Last-Modified header (if they tried to verify the file at all).

The HTTP Protocol, a Step Further
The headers shown in the preceding section are sufficient to make the resuming-solution work. But that would not cover the HTTP specification entirely.

The Range header is capable of asking for more than one range in one single request, a feature called “multipart ranges.” Don’t confuse this with segmented downloading, which almost all downloading tools use to increase the speed of the download. These tools claim to improve download speed by opening two or more simultaneous connections, each of which requests a different range of the file.

The multipart ranges idea doesn’t open multiple connections, but it does let client software request, for example, the first ten and last ten bytes of a file in a single request/response cycle.

To be completely honest, I could not find a single piece of software which really makes use of this feature. But I refused to write a disclaimer into my code reading “this is not fully HTTP compliant.” Murphy’s Law would surely spring into action if that feature were left out. However, multipart ranges are used in e-mail transfers, separating headers, plain text and attachments.

The Sample Code
Using this knowledge of how the client and server exchange header information to enable resumable downloads in conjunction with the idea of streaming chunks of a file at a time, you can add robust download management capability to your ASP.NET applications.

The way to achieve control over the download process is to intercept the download requests from the client, read the headers, and respond appropriately. Before .NET, you would have had to write an ISAPI (Internet Server API) application to accomplish this, but the .NET framework provides an IHttpHandler interface which, when implemented in a class, lets you intercept and handle requests using only .NET code. This means that your application will have total control and responsibility over the download process, IIS’s automatic functionality will no longer be involved or available.

The sample code contains a custom HttpHandler class (called ZIPHandler) in the file HttpHandler.vb. ZipHandler?implements IHttpHandler, and handles requests for .zip files.

To test the sample code, create a new virtual directory in IIS and copy the source files there. Create a file called download.zip and place it in that directory, too (note that IIS and ASP.NET cannot handle downloads larger than 2 GB, so make sure your file doesn’t exceed that size). Configure your IIS virtual directory to map the .zip extension through aspnet_isapi.dll as described here.

The HttpHandler Class: ZIPHandler
After mapping the .zip extension through to ASP.NET, IIS calls the ZipHandler class’s ProcessRequest method (see Listing 1) each time a client requests a .zip file from the server.

The ProcessRequest method first creates an instance of a custom FileInformation class (see Listing 2), which encapsulates the download state (e.g. in-progress, broken, etc.). The sample code hard-codes the path to a sample file named download.zip. If you move the code to your own application, change it to open the requested file instead.

   ' ToDo - your code here    ' Using objRequest, determine which file has been    ' requested and open objFile with that file:   ' Example:   ' objFile = New Download.FileInformation   ' ()   objFile = New Download.FileInformation( _      objContext.Server.MapPath("~/download.zip"))

Then, the procedure does a series of validation checks using the described HTTP headers (if the request provides them). It encapsulates each validation in a small private function, which returns True if the validation succeeds. If any validation check fails, the response terminates immediately, sending an appropriate StatusCode value.

   If Not objRequest.HttpMethod.Equals( _      HTTP_METHOD_GET) Or Not          objRequest.HttpMethod.Equals( _      HTTP_METHOD_HEAD) Then      ' Currently, only the GET and HEAD methods       ' are supported...      objResponse.StatusCode = 501  ' Not implemented      ElseIf Not objFile.Exists Then      ' The requested file could not be retrieved...      objResponse.StatusCode = 404  ' Not found      ElseIf objFile.Length > Int32.MaxValue Then      ' The file size is too large...       objResponse.StatusCode = 413  ' Request Entity       ' Too Large      ElseIf Not ParseRequestHeaderRange(objRequest, _      alRequestedRangesBegin, alRequestedRangesend, _      objFile.Length, bIsRangeRequest) Then      ' The Range request contained bad entries      objResponse.StatusCode = 400  ' Bad Request      ElseIf Not CheckIfModifiedSince(objRequest, _      objFile) Then      ' The entity is still unmodified...      objResponse.StatusCode = 304  ' Not Modified      ElseIf Not CheckIfUnmodifiedSince(objRequest, _      objFile) Then      ' The entity was modified since the requested       ' date...       objResponse.StatusCode = 412  ' Precondition failed      ElseIf Not CheckIfMatch(objRequest, objFile) Then      ' The entity does not match the request...       objResponse.StatusCode = 412  ' Precondition failed      ElseIf Not CheckIfNoneMatch(objRequest, objResponse, _      objFile) Then      ' The entity does match the none-match request,       ' the response code was set inside the       ' CheckIfNoneMatch function      Else      ' Preliminary checks were successful...

One of these preliminary functions, ParseRequestHeaderRange (see Listing 3), checks to see if a client requested a file range, and thus a partial download. The method sets bIsRangeRequest to True, if the requested range is valid (invalid ranges are those which exceed the file’s size, or contain illogical numbers). If a range was requested, the CheckIfRange method validates the IfRange header.

If the requested range is valid, the code calculates the response size. If the client requested multiple ranges, the response size value contains multipart header length values.

If a sent header value could not be confirmed, the procedure handles this download request not as a partial download, but instead restarts, sending a new download stream from the top of the file.

   If bIsRangeRequest AndAlso _      CheckIfRange(objRequest, objFile) Then      ' This is a Range request...          ' If the Range arrays contain more than one entry,      ' it even is a multipart range request...      bMultipart = CBool( _         alRequestedRangesBegin.GetUpperBound(0) > 0)         ' Go through each Range to get the entire Response       ' length      For iLoop = _         alRequestedRangesBegin.GetLowerBound(0) _         To alRequestedRangesBegin.GetUpperBound(0)         ' The length of the content (for this range)         iResponseContentLength += _            Convert.ToInt32(alRequestedRangesend( _            iLoop) - alRequestedRangesBegin(iLoop)) + 1            If bMultipart Then            ' If this is a multipart range request,             ' calculate the length of the intermediate             ' headers to send            iResponseContentLength += _               MULTIPART_BOUNDARY.Length            iResponseContentLength += _               objFile.ContentType.Length               iResponseContentLength += _               alRequestedRangesBegin( _               iLoop).ToString.Length            iResponseContentLength += _               alRequestedRangesend( _               iLoop).ToString.Length            iResponseContentLength += _               objFile.Length.ToString.Length            ' 49 is the length of line break and other             ' needed characters in one multipart header            iResponseContentLength += 49         End If         Next iLoop         If bMultipart Then         ' If this is a multipart range request,           ' we must also calculate the length of          ' the last intermediate header we must send         iResponseContentLength += _            MULTIPART_BOUNDARY.Length         ' 8 is the length of dash and line break          ' characters         iResponseContentLength += 8         Else         ' This is no multipart range request, so         ' we must indicate the response Range of          ' in the initial HTTP Header          objResponse.AppendHeader( _            HTTP_HEADER_CONTENT_RANGE, "bytes " & _            alRequestedRangesBegin(0).ToString & "-" & _            alRequestedRangesend(0).ToString & "/" & _            objFile.Length.ToString)      End If         ' Range response       objResponse.StatusCode = 206 ' Partial Response         Else      ' This is not a Range request, or the requested       ' Range entity ID does not match the current entity       ' ID, so start a new download         ' Indicate the file's complete size as content       ' length      iResponseContentLength = _         Convert.ToInt32(objFile.Length)         ' Return a normal OK status...      objResponse.StatusCode = 200   End If

Next the server must send a few important response headers, such as the content length, the ETag and the file’s content type.

   ' Write the content length into the Response   objResponse.AppendHeader( _      HTTP_HEADER_CONTENT_LENGTH, _      iResponseContentLength.ToString)      ' Write the Last-Modified Date into the Response   objResponse.AppendHeader( _      HTTP_HEADER_LAST_MODIFIED, _      objFile.LastWriteTimeUTC.ToString("r"))   ' Tell the client software that we accept    ' Range requests   objResponse.AppendHeader( _      HTTP_HEADER_ACCEPT_RANGES, _      HTTP_HEADER_ACCEPT_RANGES_BYTES)      ' Write the file's Entity Tag into the Response    ' (in quotes!)   objResponse.AppendHeader(HTTP_HEADER_ENTITY_TAG, _      """" & objFile.EntityTag & """")      ' Write the Content Type into the Response   If bMultipart Then      ' Multipart messages have this special Type.      ' In this case, the file's actual mime type is      ' written into the Response at a later time...      objResponse.ContentType = MULTIPART_CONTENTTYPE   Else      ' Single part messages have the files content       ' type...      objResponse.ContentType = objFile.ContentType   End If

Everything is now prepared to begin downloading the file. You use a FileStream object to read byte chunks from the file. Set the State property of the FileInformation instance objFile to fsDownloadInProgress. As long as the client stays connected, the server reads chunks from the file and sends them to the client. The code sends special headers for multipart responses. Should the client break the connection, the server sets the file state to fsDownloadBroken. If the server completes sending the requested range or ranges, it sets the state to fsDownloadFinished (see Listing 4).

The FileInformation Helper Class
As you saw in the ZIPHandler section, FileInformation is a helper class which encapsulates the download state, e.g. in-progress, broken, etc. (see Listing 2 for the complete code).

To create an instance of FileInformation, you pass the class constructor the path to the requested file.

   Public Sub New(ByVal sPath As String)      m_objFile = New System.IO.FileInfo(sPath)   End Sub

FileInformation uses a System.IO.FileInfo object to get information about that file, which it exposes as properties, for example, whether the file exists, its full name, size, etc. The class also exposes a DownloadState enumeration that describes the various states of a download request:

      Enum DownloadState       ' Clear: No download in progress,        ' the file can be manipulated       fsClear = 1          ' Locked: A dynamically created file must       ' not be changed       fsLocked = 2          ' In Progress: File is locked, and download        ' is currently in progress       fsDownloadInProgress = 6          ' Broken: File is locked, download was in       ' progress, but was cancelled        fsDownloadBroken = 10          ' Finished: File is locked, download       ' was completed       fsDownloadFinished = 18     End Enum   

FileInformation also provides the EntityTag property value. The sample code has a hard-coded value in it, because the sample uses only one download file, which will not be changed, but for a real-world application, where you’re serving multiple files, or even create files dynamically, your code must provide unique EntityTag values for each file. Plus, each time that you change or edit the file that value must change as well. This enables client software to verify if the chunk they downloaded before is still up-to-date. Here’s the section that returns the hard-coded EntityTag value in the sample code.

   Public ReadOnly Property EntityTag() As String       ' The EntityTag used in the initial (200) response        ' to, and in resume-Requests from clients        Get         ' ToDo - your code here         ' (Create a unique string for your file)         '         ' Please note, that this unique code must remain         ' the same as long as the file does not change.          ' If the file DOES change or is edited, however,         ' the code MUST change.         Return "MyExampleFileID"       End Get     End Property   

A simple and probably safe enough EntityTag could be a combination of the file name and the file’s last modified date. Whatever method you choose, please make sure that it is truly unique and can’t be confused with another file’s EntityTag. I prefer to name dynamically created files in my applications after the client, customer, and zip queue indexes, and use a GUID saved in a database for the EntityTag.

The ZipFileHandler reads and sets the public State property. After a completed download, it sets State to fsDownloadFinished. At that point you can delete temporary files. Always call the Save method here, to persist the state.

   Public Property State() As DownloadState      Get         Return m_nState      End Get      Set(ByVal nState As DownloadState)         m_nState = nState            ' ToDo - optional         ' At this point, you could delete the          ' file automatically.          ' If the state is set to Finished, you         '  might not need the file anymore:            ' If nState = _         '   DownloadState.fsDownloadFinished Then         '   Clear()         ' Else         '   Save()         ' End If            Save()      End Set   End Property

The ZipFileHandler should call the Save method whenever the file state changes, saving the file’s state, so it can be displayed to the user at a later time. You can also use it to save the EntityTag you created. Do not save the file’s state and EntityTag value to the Application, Session, or Cache?you must persist that information across any of those lifecycles.

   Private Sub Save()         ' ToDo - your code here       ' Save the state of this file's download       ' to a database or XML file...)      '      ' If you do not create files dynamically,       ' you do not need to save the state, of course.      End Sub

As written, the sample code handles only one existing file (download.zip); however you can enhance it to create requested files on demand.

When testing the sample code, your local system or LAN will probably be too fast to interrupt the download, so I recommend you use either a slow LAN connection (one way to simulate one is to reduce your site’s bandwidth in IIS) or a live server on the Internet.

Downloading can still be tough on the client side. Broken or misconfigured Web cache servers operated by ISPs can ruin large downloads, inducing corruption or early session termination. If your files exceed 255 MB in size, you might want to encourage your customers to use third-party download manager software, although some newer browsers have basic download managers built-in.

If you want to extend the sample code even more, it’s worth taking another look at the HTTP specifications. You could create MD5 digest values for the download, and add them using Content-MD5 header, providing a way to check the integrity of the download. The sample doesn’t cover HTTP methods other than GET and HEAD, either.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

More From DevX