loading..
Русский    English
22:56

Combination of detailed and aggregated data

Let the maximal and minimal prices among all the printers be output along with model and price of each printer.
This problem may be somewhat difficult for a beginner. This difficulty is in dilemma: grouping or summarizing over all set. When using grouping, the maximal and minimal values will be obtained not over all set, but for each subset defined by the grouping (in our case for each group with the same pair of values {model, price}). Taking into account that this combination does not repeat in Printer table of the learn database, we get three equal values of the price:

Console
Execute
  1. SELECT model, price, MIN(price) min_price, MAX(price) max_price
  2. FROM printer
  3. GROUP BY model, price;

But if we wouldn't use grouping, only the maximal and minimal values could be obtained, because standard syntax forbids (in view of ambiguous treatment of a result) usage of summarized values along with detailed ones without grouping by them:

Console
Execute
  1. SELECT MIN(price) min_price, MAX(price) max_price
  2. FROM printer;

The solution to the problem is fairly simple and not a single. For example, subqueries in SELECT clause  can be used. Such a subquery should return a single value, but not a set, as is the case:

Solution 1

Console
Execute
  1. SELECT model, price,
  2. (SELECT MIN(price) FROM Printer) min_price,
  3. (SELECT MAX(price) FROM Printer) max_price
  4. FROM printer;

More effective approach is to use subquery for calculation of aggregates in FROM clause along with cartesian product. You should not afraid cartesian product in this case because the subquery returns only one row which will be joined each row of detailed data.

Solution 2

Console
Execute
  1. SELECT model, price, min_price, max_price
  2. FROM printer CROSS JOIN
  3. (SELECT MIN(price) min_price, MAX(price) max_price
  4. FROM printer) X;

Why do we state that the second solution has better performance in respect to the first one? The matter is that the subquery will be calculated twice in the first query whereas once in the second one.  Besides, if the optimizer is not "clever" enough, the subqueries in the first solution will be calculated for each row of detailed data. Inspect the execution plans for your DBMS.

Let's consider the task where aggregates depend on current row of detailed data, for example:

Determine model and price of each printer as well as the maximal and minimal prices for printers of the same type.

We shall try to adapt the above approaches for solving this problem. Subqueries in Solution 1 should be rewritten as correlated ones:

Solution 1M

Console
Execute
  1. SELECT model, price, type,
  2. (SELECT MIN(price) FROM Printer P1 WHERE P1.type=P.type) min_price,
  3. (SELECT MAX(price) FROM Printer P1 WHERE P1.type=P.type) max_price
  4. FROM printer P;

As for Solution 2, we might take non-standard join CROSS APPLY (SQL Server), which uses correlated subquery in FROM clause.

 Solution 2M

Console
Execute
  1. SELECT model, price, P.type, min_price, max_price
  2. FROM Printer P CROSS APPLY
  3. (SELECT MIN(price) min_price, MAX(price) max_price
  4. FROM Printer P1
  5. WHERE P1.type=P.type) X;

The type column has been added in the solutions 1M and 2M for better insight into.

Bookmark and Share
Tags
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema date/time functions DATEADD DATEDIFF DATENAME DATEPART DATETIME DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates EXCEPT exercise (-2) exercise 19 exercise 23 exercise 32 More tags
The book was updated
month ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.
Rambler's Top100