Combination of detailed and aggregated data

Let the maximal and minimal prices among all the printers be output along with model and price of each printer.
This problem may be somewhat difficult for a beginner. This difficulty is in dilemma: grouping or summarizing over all set. When using grouping, the maximal and minimal values will be obtained not over all set, but for each subset defined by the grouping (in our case for each group with the same pair of values {model, price}). Taking into account that this combination does not repeat in Printer table of the learn database, we get three equal values of the price:

select model, price, min(price) min_price, max(price) max_price
from printer
group by model, price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

But if we wouldn’t use grouping, only the maximal and minimal values could be obtained, because standard syntax forbids (in view of ambiguous treatment of a result) usage of summarized values along with detailed ones without grouping by them:

select min(price) min_price, max(price) max_price
from printer;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

The solution to the problem is fairly simple and not a single. For example, subqueries in SELECT clause can be used. Such a subquery should return a single value, but not a set, as is the case:

Solution 1

select model, 
       price,
       (select min(price) from Printer) min_price,
       (select max(price) from Printer) max_price
from printer;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

More effective approach is to use subquery for calculation of aggregates in FROM clause along with cartesian product. You should not afraid cartesian product in this case because the subquery returns only one row which will be joined each row of detailed data.

Solution 2

select model, price, min_price, max_price
from printer 
    cross join
    (select min(price) min_price, max(price) max_price
     from printer
    ) X;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Why do we state that the second solution has better performance in respect to the first one? The matter is that the subquery will be calculated twice in the first query whereas once in the second one. Besides, if the optimizer is not “clever” enough, the subqueries in the first solution will be calculated for each row of detailed data. Inspect the execution plans for your DBMS.

Let’s consider the task where aggregates depend on current row of detailed data, for example:

Determine model and price of each printer as well as the maximal and minimal prices for printers of the same type.

We shall try to adapt the above approaches for solving this problem. Subqueries in Solution 1 should be rewritten as correlated ones:

Solution 1M

select model, price, type,
    (select min(price) from Printer P1 where P1.type=P.type) min_price,
    (select max(price) from Printer P1 where P1.type=P.type) max_price
from printer P;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

As for Solution 2, we might take non-standard join CROSS APPLY (SQL Server), which uses correlated subquery in FROM clause.

Solution 2M

select model, price, P.type, min_price, max_price
from Printer P 
    cross apply
    (select min(price) min_price, max(price) max_price
     from Printer P1
     where P1.type=P.type
    ) X;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

The type column has been added in the solutions 1M and 2M for better insight into.

#Cartesian Product #Aggregate Functions #Correlated Subqueries #CROSS APPLY

Summarizing data using ROLLUP The sorting and NULL values