Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Limit Groups by Number Using Transact-SQL or MS Access

You can easily limit or group a recordset, but doing both to the same recordset takes a bit of hand coding. Find out how both SQL Server and MS Access can handle the task.


advertisement
imiting the number of records in a group is a common request. For example, suppose your sales director requests a custom report showing the top salesperson for each region. At first, limiting the number of records in that group seems simple: Use GROUP BY and TOP, but that doesn't work. Instead of limiting the number of records in each region, TOP retrieves just one record—whichever record happens to filter to the top of the recordset. You might work your way through a number of other possibilities with varying degrees of success, including:
  • Combining GROUP BY and HAVING
  • Adding a COUNT() function to the mix
  • Using Transact SQL (T-SQL) to jump right to a subquery

The third solution goes in the right direction, but getting just the right syntax might be baffling. A subquery using IN is the way to go, unless you're using MS Access to report the SQL Server data. Then, you don't need code at all—the Report objects will limit the groups.

The SQL Server Solution

To limit the number of records in a group, you might need to limit the groups at the source: SQL Server. Most SQL Server solutions are a bit convoluted—stored procedures cycling through records to compare and count specific values. Fortunately, there's a much easier solution. Try an IN subquery that uses the following syntax:


SELECT * FROM source d1 WHERE d1.groupfield IN (SELECT TOP n pkfield FROM source d2 WHERE d1.groupfield = d2.groupfield [ORDER BY groupsortfield]) [ORDER BY tablesortfield]

Both ORDER BY clauses are optional, but in most cases you'll use them. Unless you're retrieving values at random, it doesn't make much sense to omit ORDER BY from the subquery. Most of the arguments are self-explanatory, but just in case:

  • Source is the data source.
  • Groupfield is the field by which you're grouping the records.
  • N represents the value by which you want to limit each group.
  • Pkfield is the primary key field in source.
  • Groupsortfield is the field by which you sort the grouped records in the subquery.
  • Tablesortfield is the field by which you sort the resulting recordset.

In a nutshell, the subquery returns the top n records from a grouped recordset. The recordset is based on groupfield, which exists in both the main and sub derived tables. The main SELECT retrieves a record for each matching record in the subquery and TOP limits the number of records. Without TOP, the subquery would return all the records. You could say it's a sleight of hand, but it works!

Figure 1. Using a Subquery to Limit the Number of Records in a Group: Here is the result of running code that uses a subquery to limit the number of records in a group.

Figure 1 shows the result of running the following code against the PurchaseOrderDetail table in AdventureWorks:

SELECT PurchaseOrderID, UnitPrice FROM Purchasing.PurchaseOrderDetail d1 WHERE d1.PurchaseOrderDetailID IN (SELECT TOP 3 PurchaseOrderDetailID FROM Purchasing.PurchaseOrderDetail d2 WHERE d1.PurchaseOrderID = d2.PurchaseOrderID ORDER BY UnitPrice DESC) ORDER BY d1.PurchaseOrderDetailID

The ORDER BY clause in the subquery ensures that the subquery retrieves the highest-priced items for each purchase order. Since n is 3, the subquery returns three (or fewer) records for each PurchaseOrderDetailID value. The main query sorts the retrieved records by the purchase order number.

Figure 2. Using a Subquery to Limit the Number of Records in a Group: A JOIN doesn't complicate the basic syntax.

Working with multiple tables is just as easy. Simply include a JOIN as you normally would (see Figure 2). As you can see, this syntax uses an INNER JOIN between Purchasing.PurchaseOrderHeader and Purchasing.PurchaseOrderDetail.

SELECT d1.*, d2.* FROM Purchasing.PurchaseOrderHeader d1 INNER JOIN Purchasing.PurchaseOrderDetail d2 ON d1.PurchaseOrderID = d2.PurchaseOrderID WHERE d2.PurchaseOrderDetailID IN ( SELECT TOP 3 PurchaseOrderDetailID FROM Purchasing.PurchaseOrderDetail dd WHERE dd.PurchaseOrderID = d2.PurchaseOrderID ) ORDER BY d1.PurchaseOrderID

Figure 3. Running the Enhanced Statement: You can enhance the basic statement.

Once you're familiar with the basic syntax, you can enhance the statement as necessary. For instance, Figure 3 shows the result of running the enhanced statement in the following code:

SELECT d1.*, d2.* FROM Purchasing.PurchaseOrderHeader d1 INNER JOIN Purchasing.PurchaseOrderDetail d2 ON d1.PurchaseOrderID = d2.PurchaseOrderID WHERE d2.PurchaseOrderDetailID IN (SELECT TOP 3 PurchaseOrderDetailID FROM Purchasing.PurchaseOrderDetail dd WHERE dd.PurchaseOrderID = d2.PurchaseOrderID GROUP BY PurchaseOrderDetailID ) AND d1.PurchaseOrderID IN (SELECT PurchaseOrderID FROM Purchasing.PurchaseOrderDetail dd WHERE dd.PurchaseOrderID = d1.PurchaseOrderID GROUP BY PurchaseOrderID HAVING COUNT(PurchaseOrderID) >= 3 ) ORDER BY d1.PurchaseOrderID

The HAVING COUNT() component limits records to only those groups that have at least three records. In this syntax, the two limiting values are both 3, but they don't have to be the same.

Turning the Code into a Stored Procedure

The group-limiting solution presented so far is adequate, but an input parameter would make it more flexible. By turning the code into a stored procedure, you can pass the number of records you want to see in each group when you call the stored procedure. To turn this code into a stored procedure, open a new query window against AdventureWorks and execute this code:
Figure 4. Passing the Limiting Value Via a Call to the Stored Procedure: You can turn the code into a flexible stored procedure.

CREATE PROCEDURE uspLimitGroup @n int AS BEGIN SELECT H.*, D.* FROM Purchasing.PurchaseOrderHeader H INNER JOIN Purchasing.PurchaseOrderDetail D ON H.PurchaseOrderID = D.PurchaseOrderID WHERE D.PurchaseOrderDetailID IN ( SELECT TOP (@n) PurchaseOrderDetailID FROM Purchasing.PurchaseOrderDetail dd WHERE dd.PurchaseOrderID = D.PurchaseOrderID ) ORDER BY H.PurchaseOrderID END

Notice that the code substitutes the input parameter @n for the literal value 3 used by the previous code listings.

As you can see in Figure 4, passing the limiting value via a call to the stored procedure returns the same results. Simply pass the limiting value when you call the stored procedure. You could also pass the name of the tables, making the stored procedure even more flexible.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap