Running totals
The correlated subquery may be used for calculation accumulation running totals — the usual practical problem.
In assumption of some rows` order the running total for each row is a sum of values for some numeric column in this row and for all rows above.
In the other words, the running total for first row in the ordered set would be equal to the numeric value in this row. For any other rows the running total would be equal to sum of numeric value in this row and the running total in the previous rows.
Let`s consider task like this.
Here is the query outputs information about outlays in point 2 in order of ascending of the date:
SELECT point, date, out
FROM Outcome_o o
WHERE point = 2
ORDER BY date;
[[ column ]] |
---|
NULL [[ value ]] |
point | date | out |
---|---|---|
2 | 2001-03-22 00:00:00.000 | 1440 |
2 | 2001-03-29 00:00:00.000 | 7848 |
2 | 2001-04-02 00:00:00.000 | 2040 |
Actually, to solve this problem we need to add one column for accumulating running total (run_tot). Correspondingly the theme, this column would be represent the correlated sub-query. In this subquery the total sum of the values in the out column would be calculated for this date and for all preceding dates.
SELECT point, date, out,
(SELECT SUM(out)
FROM Outcome_o
WHERE point = o.point AND date <= o.date) run_tot
FROM Outcome_o o
WHERE point = 2
ORDER BY point, date;
[[ column ]] |
---|
NULL [[ value ]] |
point | date | out | run_tot |
---|---|---|---|
2 | 2001-03-22 00:00:00.000 | 1440 | 1440 |
2 | 2001-03-29 00:00:00.000 | 7848 | 9288 |
2 | 2001-04-02 00:00:00.000 | 2040 | 11328 |
In fact, the usage of point 2 is dictated by wish to reduce the resulting set. If we want to get running totals for all points in the Outcome_o table, we need to close in comment this string
WHERE point = 2
And if we want to get running total for whole table we need to put off the condition for point`s equality:
point= o.point
But in this case we would get one total for different points which working in the same day. Here`s the such fragment from the resulting set:
point | date | out | run_tot |
---|---|---|---|
1 | 2001-03-29 00:00:00.000 | 2004 | 33599 |
2 | 2001-03-29 00:00:00.000 | 7848 | 33599 |
This is not an obstacle if we understand what we want to get. If we want to get the accumulation of the outlay by days we need to exclude the point and summarize the outlay by days.
SELECT date, SUM(out) out,
(SELECT SUM(out)
FROM Outcome_o
WHERE date <= o.date) run_tot
FROM Outcome_o o
GROUP BY date
ORDER BY date;
[[ column ]] |
---|
NULL [[ value ]] |
In the other case, we need to assign the order of the addition of the points outlays if the dates is coincided.
For instance, let
s order it by ascending of its numbers:
SELECT point, date, out,
(SELECT SUM(out)
FROM Outcome_o
WHERE date < o.date OR ( date = o.date AND point <= o.point)) run_tot
FROM Outcome_o o
ORDER BY date, point;
[[ column ]] |
---|
NULL [[ value ]] |
Suggested exercises: 69, 101, 134
Expansion of supporting window functions in SQLServer 2012 gives us opportunity to solve task on running totals in extremely simple manner.
As for our task, we expose the following new features:
- Usage of sorting in OVER clause when applying aggregate functions.
- Specifying a range which is being treated by aggregate function. In so doing the range can be limited as well as
unlimited, for example, it can extend from current row to beginning sorted row set.
Thus we can obtain running totals sorting data by date and calculating sum of values in a range from current row and up (unlimitedly) above, and we can do it with aid of a single function!
The exercise on previous page about running totals for the point 2 can now be solved as follows.
select point, date, out,
SUM(out) OVER (PARTITION BY point
ORDER BY date
RANGE
UNBOUNDED
PRECEDING -- from the current row to beginning
)
FROM Outcome_o o
WHERE point = 2
ORDER BY point, date;
To obtain running totals for each point separately, we remove from above query only the condition for the point 2:
select point, date, out,
SUM(out) OVER (PARTITION BY point ORDER BY date RANGE UNBOUNDED PRECEDING)
FROM Outcome_o o
ORDER BY point, date;
The above queries have to work under PostgreSQL and Oracle. As for MySQL, window functions was supported in MySQL 8.0.
Suggested exercises: 106