The sorting and NULL values

If a column in the ORDER BY clause allows nulls, you should take into account in the case of SQL Server that NULL values go in the beginning of the result set when sorting is done in ascending order, and in the end of the result set when sorting is done in descending order.

As NULL values are absent from the teaching databases available here (in view of conformity between these databases and ones used on the sql-ex.ru), the copy of PC table named PC_ was created with additional row containing NULL in the price column:

insert into PC_ values(13,2112,600,64,8,'24x',NULL);

It should be noted that table design allows NULLs in the price column.

You can assure yourself in the said above by executing the pair of following queries.

select * from PC_ order by price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

select * from PC_ order by price DESC;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Why could it be important? The matter is that when seeking extremal values, the method based on sorting is frequently used. Let’s consider the following task, for example.

Find out the models of PCs having the lowest price.

Sometimes this task is being solved by the following way:

select top 1 with ties model from PC_ order by price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

WITH TIES keywords are used to deduce all the models with the lowest price, not only one. But we get model 2112 with unknown price as a result instead of models 1232 and 1260 which have lowest price indeed. To obtain correct result we need to exclude PCs with unknown price.

select top 1 with ties model 
from PC_ 
where price is not null 
order by price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Following this way, we encounter with another issue related with duplicates. The data contain two PCs of the model 1232 with the same lowest price, so the result set gives us the model 1232 twice. DISTINCT keyword can not be applied if SELECT list does not include column(s) used in the ORDER BY clause. The error we get says to us about it.

select
DISTINCT top 1 with ties model 
from PC_ 
where price is not null 
order by price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

(ORDER BY items must appear in the select list if SELECT DISTINCT is specified.)

To get solution as required, we can include price in the select list and use the query obtained in such a way as a subquery. So,

select model 
from (select DISTINCT top 1 with ties model, price
      from PC_
      where price is not null
      order by price
     ) X;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Note

There are no problems with NULLs when using aggregate functions, as NULLs are not included in treatment. Although subquery ought to be used also:

select DISTINCT model 
from PC_
where price = (select min(price) from PC_);

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Note also that standard solution will work under all DBMSs, as it does not use specific features of a dialect.

By the way, can we use sorting-based method in other DBMS?

MySQL allows us to use DISTINCT without mandatory listing the sorting columns in SELECT clause. However MySQL has not analog of WITH TIES to solve the task extremely simply. Because of this we need to use subquery to deduce all the models with lowest price in the sorting-based method:

select DISTINCT model 
from PC_
where price = (select price
               from PC_ 
               where price is not null 
               order by price limit 1
              );

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

The same solution will work under PostgreSQL also, but the latter has a feature you usefully to know about. Namely, we can point where NULLs ought to be deduced - in the end or in the beginning of the result set. In view of solving our task, it is desired that the NULLs will go in the end of the sorted list. In this case we reduce the extra operation which eliminates NULL values.

select DISTINCT model 
from PC_
where price = (select price 
               from PC_ 
               order by price nulls last limit 1
              );

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

By the way, NULL values go in the end of result set when sorting is performed in ascending order. Because of this NULLS LAST keywords which we used in the above query can be omitted in the solution of the task.

select DISTINCT model 
from PC_
where price = (select price 
               from PC_ 
               order by price limit 1
              );

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

If we want that NULLs went in the beginning of the result set when sorting, we should write NULLS FIRST instead.

We could imitate in MySQL the behaviour of NULLS FIRST/LAST features. To do this, we’ll use the fact that boolean data type in this DBMS is presented as TINYINT(1). Specifically, this means that 0 is corresponding to FALSE, whereas nonzero values are considered true. With so doing, a logical expression that is true represents 1, i.e.

select a IS NULL as a, b IS NULL as b 
from (select NULL as a, 1 as b) x;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

gives us

a	b
1	0

Taking into account that 0 precedes 1 when sorting in ascending order, we can adapt for MySQL the solution obtained for PostgreSQL:

SELECT DISTINCT model 
FROM PC_
WHERE price = (SELECT price 
               FROM PC_ 
               ORDER BY price is NULL, price LIMIT 1
              );

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Similarly PostgreSQL, Oracle places NULLs in the end of the result set when sorting in the ascending order. There are NULLS FIRST/LAST also, but analog of LIMIT/TOP N for limiting the number of returned rows is absent.

To imitate the above approach to solution of the task in Oracle, we can use embedded function ROWNUM. The function performs numbering of the rows of a query, but does it after execution of FROM and WHERE clauses, i.e. before execution of SELECT and ORDER BY clauses. This can be illustrated by the result of the following query:

select code, model,price, ROWNUM rn 
from PC_ 
ORDER BY price;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

CODE	MODEL	PRICE	RN
10	1260	350	10
9	1232	350	9
8	1232	350	8
7	1232	400	7
3	1233	600	3
1	1232	600	1
5	1121	850	5
2	1121	850	2
4	1121	850	4
6	1233	950	6
12	1233	970	12
11	1233	980	11
13	2112	NULL	13

We can see that the row number does not correspond to sorting order. It is easy to verify that the numbering corresponds to the code column. This is due to the fact that optimizer uses index on this column when executing the query. So, to find out the lowest price on the base of sorting, we need to use subquery:

SELECT price 
FROM (select model,price 
      from PC_ 
      ORDER BY price
     ) X
WHERE ROWNUM = 1;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

As in the case of MySQL and PostgreSQL, we’ll use this query for obtaining models which have price determined with aid of it.

select DISTINCT model 
from PC_ 
where price = (SELECT price 
               FROM (select model,price 
                     from PC_ 
                      ORDER BY price
                     ) X
                WHERE ROWNUM = 1
               );

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Solomon said: “In much wisdom is much grief: and he that increaseth knowledge increaseth sorrow.”

Use standard solutions, I say. :-)

In conclusion, I want to tell about a way to solution which uses ranking functions. Solution idea is in ranking (RANK function is used) of the rows over ascending order of price with following choosing the (unique) rows having the rank of 1. The following query must work under any DBMS which support window functions:

SELECT DISTINCT model 
FROM (select model, Rank() OVER (ORDER BY price) rn 
      from PC_
      WHERE price IS NOT NULL
     ) X 
WHERE rn =1;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

The fact that NULL-values go first when sorting in ascending order (SQL Server), we can use in the “useful purposes”.

Assume that we need to deduce the list of flights in which flights from Rostov should go first, and then the others in alphabetic order of departure cities.

Here the function NULLIF(town_from,‘Rostov’), which will return NULL if city of departure is ’ Rostov ‘, very opportune to be useful.

The problem is being solved with the following query:

select trip_no, town_from, town_to
from Trip
order by NULLIF(town_from,'Rostov'), trip_no;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

#SQL Server #Sql-Ex.ru #Sorting #NULL #WITH TIES #Oracle #PostgreSQL #MySQL #NULLS FIRST #NULLS LAST #ROWNUM #TINYINT #BOOLEAN #NULLIF

Combination of detailed and aggregated data Aggregate function to aggregate function