Oracle Analytic Queries Tutorial

Introduction

This section will kick things off with a simple analytic query, but first let’s take a look at the table we’ll be using throughout this article (the create and populate scripts are here):

DESC Employees;

Name       Null? Type
---------- ----- ------------
NAME             VARCHAR2(10)
LOCATION         VARCHAR2(8)
DEPARTMENT       VARCHAR2(10)
SALARY           NUMBER(7)
HIREDATE         DATE

SELECT * FROM Employees;

NAME      LOCATION DEPARTMENT SALARY HIREDATE
--------- -------- ---------- ------ ---------
Donna     Boston   IT          46000 03-JUN-05
Joe       Boston   IT          40000 28-APR-09
Ji Ping   Boston   IT          63000 14-DEC-10
Tonya     Detroit  Accounting  60000 04-NOV-08
Antonio   Detroit  Accounting  51000 08-OCT-12
Manoharan Boston   Marketing   60000 30-MAR-09
Juan      Boston   IT          75000 25-SEP-05
Shailaja  Detroit  Logistics   67000 22-JUN-12
Jamila    Boston   Marketing   45000 11-AUG-12

Now let’s rank the salaries from low to high. The employee with the lowest salary will be ranked #1, the employee with the second-lowest salary will be ranked #2, and so on.

SELECT
  Name,
  Salary,
  RANK() OVER (ORDER BY Salary) AS SalaryRank
FROM Employees
ORDER BY SalaryRank;
                 SALARY
NAME      SALARY   RANK
--------- ------ ------
Joe        40000      1
Jamila     45000      2
Donna      46000      3
Antonio    51000      4
Manoharan  60000      5  -- Manoharan and Tonya
Tonya      60000      5     are tied for fifth place
Ji Ping    63000      7  -- Ji Ping is in seventh place;
Shailaja   67000      8     this will be covered later
Juan       75000      9

The RANK function assigns a ranking based on the instructions in the OVER clause. The instructions in the example above are ORDER BY Salary, which tells Oracle to assign a ranking based on the salary, from low to high.

How Oracle evaluates analytic functions

Oracle first does joins, the WHERE clause, grouping, and the HAVING clause, then it evaluates analytic functions such as the RANK() in the query above. Put another way, analytic functions are applied against the final result set, just before it’s ordered. If a row from the queried table doesn’t make it through the join, WHERE, or HAVING, analytic functions won’t ever see it.

Oracle performs the ORDER BY last, after the analytic functions have been evaluated (Oracle always does the ORDER BY last, even on queries without analytic functions). That’s why the example above can order by SalaryRank – because it’s already been calculated by the time Oracle gets to ORDER BY.

The RANK() Analytic Function

As explained above, RANK() assigns a ranking based on the defined order. The ranking can be performed on any value, and it doesn’t have to be the same value that the results are ordered by. This query ranks employees based on the alphabetical order of their name but sorts results by salary, low to high:

SELECT
  Name,
  Department,
  Salary,
  RANK() OVER (ORDER BY Name) AS NameRank
FROM Employees
ORDER BY Salary;
NAME      DEPARTMENT SALARY NAMERANK
--------- ---------- ------ --------
Joe       IT          40000        5 -- The name "Joe" ranks fifth alphabetically
Jamila    Marketing   45000        3 -- The name "Jamila" ranks third
Donna     IT          46000        2 -- The name "Donna" ranks second
Antonio   Accounting  51000        1
Tonya     Accounting  60000        9
Manoharan Marketing   60000        7
Ji Ping   IT          63000        4
Shailaja  Logistics   67000        8
Juan      IT          75000        6

Granted, this isn’t a very useful “real world” query. Its purpose is to show that any value can be ranked, as well as most data types (numeric, character, date, timestamp, etc.), and that the ORDER BY in the analytic function is independent of the query’s final ORDER BY.

Ranking isn’t limited to “pure” columns; expressions are allowed too. This query ranks employees based on the first character of their name:

SELECT
  Name,
  Department,
  Salary,
  RANK() OVER (ORDER BY SUBSTR(Name, 1, 1)) AS InitialRank
FROM Employees
ORDER BY InitialRank;
NAME      DEPARTMENT SALARY INITIALRANK
--------- ---------- ------ -----------
Antonio   Accounting  51000           1
Donna     IT          46000           2
Joe       IT          40000           3
Juan      IT          75000           3
Ji Ping   IT          63000           3
Jamila    Marketing   45000           3
Manoharan Marketing   60000           7
Shailaja  Logistics   67000           8
Tonya     Accounting  60000           9

Again, not a useful query in the real world, but it shows that any value – even an expression – can be ranked.

Ranking high to low

Salary ranks normally go from high to low, unlike the previous example which goes from low to high. To rank salaries from high to low, specify ORDER BY Salary DESC in the ranking function, just like you’d use DESC in a “regular” ORDER BY clause:

SELECT
  Name,
  Salary,
  RANK() OVER (ORDER BY Salary DESC) AS SalaryRank
FROM Employees
ORDER BY SalaryRank;
NAME      SALARY SALARYRANK
--------- ------ ----------
Juan       75000          1
Shailaja   67000          2
Ji Ping    63000          3
Manoharan  60000          4
Tonya      60000          4
Antonio    51000          6
Donna      46000          7
Jamila     45000          8
Joe        40000          9

The DENSE_RANK Analytic Function

The query that uses RANK to rank salaries from high to low has Manoharn and Tonya in fourth place with salaries of 60,000. Antonio comes after Manoharan and Tonya with a salary of 51,000 but he’s in sixth place. There isn’t a fifth place. That’s just how RANK works: five employees have higher salaries than Antonio, so Antonio comes in sixth.

If you need to rank without gaps, use the DENSE_RANK function. The following query shows how DENSE_RANK compares to RANK. It also illustrates the fact that a query can have more than one analytic function:

SELECT
  Name,
  Salary,
  RANK() OVER (ORDER BY Salary DESC) AS SalaryRank,
  DENSE_RANK() OVER (ORDER BY Salary DESC) AS SalaryDenseRank
FROM Employees
ORDER BY SalaryDenseRank;
NAME      SALARY SALARYRANK SALARYDENSERANK
--------- ------ ---------- ---------------
Juan       75000          1               1
Shailaja   67000          2               2
Ji Ping    63000          3               3
Manoharan  60000          4               4
Tonya      60000          4               4
Antonio    51000          6               5
Donna      46000          7               6
Jamila     45000          8               7
Joe        40000          9               8

Note how DENSE_RANK is the same as RANK through fourth place. That’s where the first tie occurs. After the tie, RANK assigns the next place as sixth whereas DENSE_RANK assigns it as fifth. With DENSE_RANK there are no gaps in the rankings.

The ROW_NUMBER Analytic Function

In the preceding queries, the nine rows were ranked as {1, 2, 3, 4, 4, 6, 7, 8, 9} and {1, 2, 3, 4, 4, 5, 6, 7, 8}. What if you have nine rows and need an unbroken sequence of rankings from 1 through 9? That’s where ROW_NUMBER comes in: it ranks sequentially, without repeats and without gaps.

The following query shows the difference in behavior between the RANK, DENSE_RANK, and ROW_NUMBER analytic functions:

SELECT
  Name,
  Salary,
  RANK() OVER (ORDER BY Salary DESC) AS SalaryRank,
  DENSE_RANK() OVER (ORDER BY Salary DESC) AS SalaryDenseRank,
  ROW_NUMBER() OVER (ORDER BY Salary DESC) AS SalaryRowNumber
FROM Employees
ORDER BY SalaryRowNumber;
NAME      SALARY SALARYRANK SALARYDENSERANK SALARYROWNUMBER
--------- ------ ---------- --------------- ---------------
Juan       75000          1               1               1
Shailaja   67000          2               2               2
Ji Ping    63000          3               3               3
Manoharan  60000          4               4               4
Tonya      60000          4               4               5
Antonio    51000          6               5               6
Donna      46000          7               6               7
Jamila     45000          8               7               8
Joe        40000          9               8               9

Note how the SalaryRowNumber column has an unbroken sequence of one through nine.

How did Manoharan end up as row number 5 while Tonya got row number 6? The answer is: it just turned out that way. When there’s a tie, Oracle assigns the ROW_NUMBER value arbitrarily.

A quick query of the table will show that Manoharan is in the Marketing department while Tonya is in Accounting. To make the department name the tie-breaker, use it as the second ranking value, just as you would with a “regular” ORDER BY:

SELECT
  Name,
  Department,
  Salary,
  RANK() OVER (ORDER BY Salary DESC) AS SalaryRank,
  DENSE_RANK() OVER (ORDER BY Salary DESC) AS SalaryDenseRank,
  ROW_NUMBER() OVER (ORDER BY Salary DESC, Department) AS SalaryRowNumber
FROM Employees
ORDER BY SalaryRowNumber
NAME       DEPARTMENT     SALARY SALARYRANK SALARYDENSERANK SALARYROWNUMBER
---------- ---------- ---------- ---------- --------------- ---------------
Juan       IT              75000          1               1               1
Shailaja   Logistics       67000          2               2               2
Ji Ping    IT              63000          3               3               3
Tonya      Accounting      60000          4               4               4
Manoharan  Marketing       60000          4               4               5
Antonio    Accounting      51000          6               5               6
Donna      IT              46000          7               6               7
Jamila     Marketing       45000          8               7               8
Joe        IT              40000          9               8               9

Now Tonya gets row number 4 and Manoharan gets row number 5, because Tonya’s department sorts lower than Manoharan’s. Tie-breakers can also be used for RANK and DENSE_RANK.

The SUM Analytic Function

The analytic version of the SUM doesn’t need a GROUP BY to calculate a sum. This query calculates the sum for all employees in the company and includes it on each row of the results:

SELECT
  Name,
  Salary,
  SUM(Salary) OVER () AS TotalSalary
FROM Employees
ORDER BY Name;
NAME           SALARY TOTALSALARY
---------- ---------- -----------
Antonio         51000      507000 -- total salary for all employees
Donna           46000      507000
Jamila          45000      507000
Ji Ping         63000      507000
Joe             40000      507000
Juan            75000      507000
Manoharan       60000      507000
Shailaja        67000      507000
Tonya           60000      507000

Note the syntax: SUM(Salary) OVER (). The OVER keyword indicates that this is an analytic function, specifically the analytic version of SUM. The absence of instructions within the OVER() means “over all results”.

It might seem strange to include an ORDER BY instruction for the SUM function, but it can be done and it has a specific meaning: apply the function to all rows in order up to the current row. For SUM, this means to sum all rows up to the current row on a sliding basis. In other words, the ORDER BY can be used in conjunction with SUM to calculate running totals:

SELECT
  Name,
  Salary,
  SUM(Salary) OVER (ORDER BY Salary) AS RunningSal
FROM Employees
ORDER BY Salary;
NAME           SALARY RUNNINGSAL
---------- ---------- ----------
Joe             40000      40000
Jamila          45000      85000
Donna           46000     131000
Antonio         51000     182000
Manoharan       60000     302000
Tonya           60000     302000
Ji Ping         63000     365000
Shailaja        67000     432000
Juan            75000     507000

The COUNT, AVG, MIN, and MAX Analytic Functions

These analytic functions act much like the SUM analytic function, showing the overall count, average, minimum, and maximum on each row:

SELECT
  Name,
  Salary,
  COUNT(*) OVER () AS OverallCount,
  AVG(Salary) OVER () AS AverageSalary,
  MIN(Salary) OVER () AS MinSalary,
  MAX(Salary) OVER () AS MaxSalary
FROM Employees
ORDER BY Name;
NAME      SALARY OVERALLCOUNT AVERAGESALARY  MINSALARY  MAXSALARY
--------- ------ ------------ ------------- ---------- ----------
Antonio    51000            9    56333.3333      40000      75000
Donna      46000            9    56333.3333      40000      75000
Jamila     45000            9    56333.3333      40000      75000
Ji Ping    63000            9    56333.3333      40000      75000
Joe        40000            9    56333.3333      40000      75000
Juan       75000            9    56333.3333      40000      75000
Manoharan  60000            9    56333.3333      40000      75000
Shailaja   67000            9    56333.3333      40000      75000
Tonya      60000            9    56333.3333      40000      75000

As with the earlier SUM example that included an ORDER BY directive to calculate sums up to the current row, the COUNT, AVG, MIN, and MAX functions can define the “window” of rows using ORDER BY. This query calculates count, average, minimum, and maximum based on salary, and for ease of reading it orders the final results by salary rather than name:

SELECT
  Name,
  Salary,
  COUNT(*) OVER (ORDER BY Salary) AS OverallCount,
  AVG(Salary) OVER (ORDER BY Salary) AS AverageSalary,
  MIN(Salary) OVER (ORDER BY Salary) AS MinSalary,
  MAX(Salary) OVER (ORDER BY Salary) AS MaxSalary
FROM Employees
ORDER BY Salary
NAME           SALARY OVERALLCOUNT AVERAGESALARY  MINSALARY  MAXSALARY
---------- ---------- ------------ ------------- ---------- ----------
Joe             40000            1         40000      40000      40000
Jamila          45000            2         42500      40000      45000
Donna           46000            3    43666.6667      40000      46000
Antonio         51000            4         45500      40000      51000
Manoharan       60000            6    50333.3333      40000      60000
Tonya           60000            6    50333.3333      40000      60000
Ji Ping         63000            7    52142.8571      40000      63000
Shailaja        67000            8         54000      40000      67000
Juan            75000            9    56333.3333      40000      75000

If the final result is ordered by name, the analytic values are still the same because they’re calculated with their own ORDER BY instructions. In the query below, note how the running count, minimum, maximum and average calculated for each employee is the same, even though the final results are ordered by the employee’s name rather than their salary:

SELECT
  Name,
  Salary,
  COUNT(*) OVER (ORDER BY Salary) AS OverallCount,
  AVG(Salary) OVER (ORDER BY Salary) AS AverageSalary,
  MIN(Salary) OVER (ORDER BY Salary) AS MinSalary,
  MAX(Salary) OVER (ORDER BY Salary) AS MaxSalary
FROM Employees
ORDER BY Name
NAME           SALARY OVERALLCOUNT AVERAGESALARY  MINSALARY  MAXSALARY
---------- ---------- ------------ ------------- ---------- ----------
Antonio         51000            4         45500      40000      51000
Donna           46000            3    43666.6667      40000      46000
Jamila          45000            2         42500      40000      45000
Ji Ping         63000            7    52142.8571      40000      63000
Joe             40000            1         40000      40000      40000
Juan            75000            9    56333.3333      40000      75000
Manoharan       60000            6    50333.3333      40000      60000
Shailaja        67000            8         54000      40000      67000
Tonya           60000            6    50333.3333      40000      60000

LAG and LEAD

LAG and LEAD give direct access to values from rows before or after the current row, something that’s extremely difficult (and slow) to do with a non-analytic query.

The full syntax for LAG is:

LAG(expression [, number of rows [, default value]]) OVER([instructions])

The parameters are:

  • expression: the value to return from a prior row
  • number of rows: get the value from this many rows before the current row; the default is one row before
  • default value: the value to use when there isn’t a preceding row; the default is NULL

The LEAD function is similar except it takes values from rows after the current row.

The preceding/following rows are defined using ORDER BY. Here’s an example that lists employees ordered by salary and includes the salary of the preceding employee:

SELECT
  Name,
  Salary,
  LAG(Salary) OVER (ORDER BY Salary) AS PriorEmployeeSalary
FROM Employees
ORDER BY Salary;
NAME      SALARY PRIOREMPLOYEESALARY
--------- ------ -------------------
Joe        40000                     -- No prior row so prior salary is null
Jamila     45000               40000 -- Salary for Joe
Donna      46000               45000 -- Salary for Jamila
Antonio    51000               46000 -- Salary for Donna
Manoharan  60000               51000
Tonya      60000               60000
Ji Ping    63000               60000
Shailaja   67000               63000
Juan       75000               67000 -- Salary for Shailaja

Note how the each row has the salary for the employee in the previous row. To show the salary for the employee two rows before the current employee, specify 2 as the “number of rows” parameter:

SELECT
  Name,
  Salary,
  LAG(Salary, 2) OVER (ORDER BY Salary) AS PriorEmployeeSalary
FROM Employees
ORDER BY Salary;
                     PRIOR
                  EMPLOYEE
NAME       SALARY   SALARY
---------- ------ --------
Joe         40000          -- No "two rows prior" so value is null
Jamila      45000          -- No "two rows prior" so value is null
Donna       46000    40000 -- Salary for Joe two rows prior
Antonio     51000    45000 -- Salary for Jamila two rows prior
Manoharan   60000    46000 -- Salary for Donna two rows prior
Tonya       60000    51000
Ji Ping     63000    60000
Shailaja    67000    60000
Juan        75000    63000

If there isn’t a prior row, LAG defaults to returning null. To return a value other than null, use the third parameter. This query shows the salary from the prior row and assigns -1 as the default when there isn’t a prior row:

SELECT
  Name,
  Salary,
  LAG(Salary, 1, -1) OVER (ORDER BY Salary) AS PriorEmployeeSalary
FROM Employees
ORDER BY Salary;
                     PRIOR
                  EMPLOYEE
NAME       SALARY   SALARY
---------- ------ --------
Joe         40000       -1 -- no prior row so value is -1 as specified
Jamila      45000    40000
Donna       46000    45000
Antonio     51000    46000
Manoharan   60000    51000
Tonya       60000    60000
Ji Ping     63000    60000
Shailaja    67000    63000
Juan        75000    67000

Note that the second argument to LAG is 1, which means “one row prior”. This is the default value for the second argument, but because there’s a third argument it has to be stated explicitly.

The LEAD function’s behavior is similar to the LAG function, except it gets the value from the following row instead of the prior row:

SELECT
  Name,
  Salary,
  LEAD(Salary) OVER (ORDER BY Salary) AS NextEmployeeSalary
FROM Employees
ORDER BY Salary;
NAME      SALARY NEXTEMPLOYEESALARY
--------- ------ ------------------
Joe        40000              45000
Jamila     45000              46000
Donna      46000              51000
Antonio    51000              60000
Manoharan  60000              60000
Tonya      60000              63000 -- salary for Ji Ping
Ji Ping    63000              67000 -- salary for Shailaja
Shailaja   67000              75000 -- salary for Juan
Juan       75000                    -- no following row for Juan

QUARTILES AND QUINTILES AND N-TILES

The NTILE analytic function takes one argument – the number of “tiles” – and assigns an “n-tile” ranking for each row based on the ORDER BY. This query assigns a quartile (four “tiles”) to each of the salaries, from low to high:

SELECT
  Name,
  Salary,
  NTILE(4) OVER (ORDER BY Salary) AS SalQuartile
FROM Employees
ORDER BY Salary;
NAME      SALARY SALQUARTILE
--------- ------ -----------
Joe        40000           1
Jamila     45000           1
Donna      46000           1
Antonio    51000           2
Manoharan  60000           2
Tonya      60000           3
Ji Ping    63000           3
Shailaja   67000           4
Juan       75000           4

To get the quintile (subgroups of five), specify 5 as the argument to NTILE. To get a decile (subgroups of ten), specify 10. To get a percentile, specify 100. In short, the NTILE function will take any number and use it to assign subgroups. The number should be an integer, but if it isn’t it will be truncated to an integer (for example, 4.99999 will be truncated to 4).

PARTITIONING

The examples so far have shown how to apply rankings, sums, counts, and so forth across the entire result set. Many of the examples can be accomplished using non-analytic queries, though the analytic version is generally shorter and (after one becomes familiar with analytic queries) cleaner. The analytic version is usually faster as well because the non-analytic version tends to rely on complicated joins and subqueries.

Analytic queries take this power to the next level when they use partitioning. With partitioning, the analytic function is applied to subsets of the data rather than the entire result set. Partitioning with a non-analytic query is somewhere between complicated and impossible.

As with most analytic query concepts, partitioning is better explained with an example. This query ranks salary within each department; the only difference between it and an earlier query that ranks salaries for all employees is the PARTITION BY instruction:

SELECT
  Department,
  Name,
  Salary,
  RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS DeptRank
FROM Employees
ORDER BY Department, Salary DESC;
DEPARTMENT NAME       SALARY DEPTRANK
---------- ---------- ------ --------
Accounting Tonya       60000        1 -- Highest salary in accounting
Accounting Antonio     51000        2
IT         Juan        75000        1 -- Highest salary in IT
IT         Ji Ping     63000        2
IT         Donna       46000        3
IT         Joe         40000        4 -- Lowest salary in IT
Logistics  Shailaja    67000        1
Marketing  Manoharan   60000        1
Marketing  Jamila      45000        2

The PARTITION BY Department instructs the RANK function to rank within each department. Without a PARTITION BY the ranking is applied across the entire result set.

A partition can be defined by any valid expression; it’s not limited to pure column values. This query (not very useful in the real world) ranks salaries from high to low by the first letter of the employee’s last name.

SELECT
  SUBSTR(Name, 1, 1) AS FirstInit,
  Name,
  Salary,
  RANK() OVER (
    PARTITION BY SUBSTR(Name, 1, 1)
    ORDER BY Salary DESC) AS FirstInitRank
FROM Employees
ORDER BY FirstInit, Salary DESC;
FIRSTINIT NAME      SALARY FIRSTINITRANK
--------- --------- ------ -------------
A         Antonio    51000             1
D         Donna      46000             1
J         Juan       75000             1 -- Highest salary of names beginning with "J"
J         Ji Ping    63000             2
J         Jamila     45000             3
J         Joe        40000             4 -- Lowest salary of names beginning with "J"
M         Manoharan  60000             1
S         Shailaja   67000             1
T         Tonya      60000             1

MULTIPLE PARTITIONS

The examples shown so far have operated across just one partition (department, or first initial, or the entire result set). Analytic queries can have multiple partitions; this query ranks salary within location, within department, and across all employees:

SELECT
  Name,
  Location,
  Department,
  Salary,
  RANK() OVER (
    PARTITION BY Location
    ORDER BY Salary DESC) AS LocRank,
  RANK() OVER (
    PARTITION BY Department
    ORDER BY Salary DESC) AS DeptRank
FROM Employees
ORDER BY
  Location,
  Department,
  LocRank
NAME      LOCATION DEPARTMENT SALARY LOCRANK DEPTRANK
--------- -------- ---------- ------ ------- --------
Juan      Boston   IT          75000       1        1
Ji Ping   Boston   IT          63000       2        2
Donna     Boston   IT          46000       4        3
Joe       Boston   IT          40000       6        4
Manoharan Boston   Marketing   60000       3        1
Jamila    Boston   Marketing   45000       5        2
Tonya     Detroit  Accounting  60000       2        1
Antonio   Detroit  Accounting  51000       3        2
Shailaja  Detroit  Logistics   67000       1        1

AGGREGATE VALUES ON EVERY LINE

Normally, aggregate values such as SUM, COUNT, AVG, etc., require grouping. With analytic queries, aggregates can be reported along with individual row values. This query shows the employee’s salary alongside the average salary for their department and the average salary for the entire company:

SELECT
  Name,
  Department,
  Salary,
  AVG(Salary) OVER (PARTITION BY Department) AS DeptAverage,
  AVG(Salary) OVER () AS CompanyAverage
FROM Employees
ORDER BY
  Department,
  Salary;
NAME       DEPARTMENT     SALARY DEPTAVERAGE COMPANYAVERAGE
---------- ---------- ---------- ----------- --------------
Antonio    Accounting      51000       55500     56333.3333
Tonya      Accounting      60000       55500     56333.3333
Joe        IT              40000       56000     56333.3333
Donna      IT              46000       56000     56333.3333
Ji Ping    IT              63000       56000     56333.3333
Juan       IT              75000       56000     56333.3333
Shailaja   Logistics       67000       67000     56333.3333
Jamila     Marketing       45000       52500     56333.3333
Manoharan  Marketing       60000       52500     56333.3333

This query shows the employee’s salary alongside the minimum and maximum for their department:

SELECT
  Name,
  Department,
  Salary,
  MIN(Salary) OVER (PARTITION BY Department) AS DeptMin,
  MAX(Salary) OVER (PARTITION BY Department) AS DeptMax
FROM Employees
ORDER BY
  Department,
  Salary;
NAME       DEPARTMENT     SALARY    DEPTMIN    DEPTMAX
---------- ---------- ---------- ---------- ----------
Antonio    Accounting      51000      51000      60000
Tonya      Accounting      60000      51000      60000
Joe        IT              40000      40000      75000
Donna      IT              46000      40000      75000
Ji Ping    IT              63000      40000      75000
Juan       IT              75000      40000      75000
Shailaja   Logistics       67000      67000      67000
Jamila     Marketing       45000      45000      60000
Manoharan  Marketing       60000      45000      60000

CALCULATIONS USING ANALYTIC VALUES

Analytic values can be added and subtracted and formatted and passed to functions and treated like any other value. This query shows the employee’s salary, how far it is above the company minimum, and how far it is below the company maximum:

SELECT
  Name,
  Department,
  Salary,
  Salary - MIN(Salary) OVER () AS OverMin,
  MAX(Salary) OVER () - Salary AS UnderMax
FROM Employees
ORDER BY Salary;
NAME       DEPARTMENT     SALARY    OVERMIN   UNDERMAX
---------- ---------- ---------- ---------- ----------
Joe        IT              40000          0      35000
Jamila     Marketing       45000       5000      30000
Donna      IT              46000       6000      29000
Antonio    Accounting      51000      11000      24000
Manoharan  Marketing       60000      20000      15000
Tonya      Accounting      60000      20000      15000
Ji Ping    IT              63000      23000      12000
Shailaja   Logistics       67000      27000       8000
Juan       IT              75000      35000          0

This query ranks salaries high to low by department and displays the ranking as “Salary rank X of Y in (department)”:

SELECT
  Name,
  Department,
  Salary,
  'Salary rank ' ||
      RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) ||
      ' of ' ||
      COUNT(*) OVER (PARTITION BY DEPARTMENT) ||
      ' in ' ||
      Department
    AS DeptRank
FROM Employees
ORDER BY
  Department,
  Salary DESC;
NAME       DEPARTMENT     SALARY DEPTRANK
---------- ---------- ---------- --------------------------------
Tonya      Accounting      60000 Salary rank 1 of 2 in Accounting
Antonio    Accounting      51000 Salary rank 2 of 2 in Accounting
Juan       IT              75000 Salary rank 1 of 4 in IT
Ji Ping    IT              63000 Salary rank 2 of 4 in IT
Donna      IT              46000 Salary rank 3 of 4 in IT
Joe        IT              40000 Salary rank 4 of 4 in IT
Shailaja   Logistics       67000 Salary rank 1 of 1 in Logistics
Manoharan  Marketing       60000 Salary rank 1 of 2 in Marketing
Jamila     Marketing       45000 Salary rank 2 of 2 in Marketing

FIRST_VALUE and LAST_VALUE

Use the FIRST_VALUE and LAST_VALUE functions to get the first value within a partition based on an ORDER BY. What makes this powerful is that the first/last value doesn’t have to be the same as the ORDER BY value.

This query shows salary by department and uses the FIRST_VALUE function to show the name of the lowest earner in each department. The highest earner can be calculated with the LAST_VALUE function, but that function needs an additional analytic instruction which won’t be covered until the next section.

SELECT
  Name,
  Department,
  Salary,
  FIRST_VALUE(Name) OVER (PARTITION BY Department ORDER BY Salary) AS LowSalName
FROM Employees
ORDER BY
  Department,
  Salary;
NAME       DEPARTMENT     SALARY LOWSALNAME
---------- ---------- ---------- ----------
Antonio    Accounting      51000 Antonio
Tonya      Accounting      60000 Antonio
Joe        IT              40000 Joe
Donna      IT              46000 Joe
Ji Ping    IT              63000 Joe
Juan       IT              75000 Joe
Shailaja   Logistics       67000 Shailaja
Jamila     Marketing       45000 Jamila
Manoharan  Marketing       60000 Jamila

WINDOWING BY ROW

Analytic queries can also define a slice of rows within a partition. This slice is called a window or a sliding window. Let’s start with an example that shows employees in name order, with a running count of employees listed so far:

SELECT
  Name,
  COUNT(*) OVER (ORDER BY Name) AS TotSoFar
FROM Employees
ORDER BY Name
NAME      TOTSOFAR
--------- --------
Antonio          1
Donna            2
Jamila           3
Ji Ping          4
Joe              5
Juan             6
Manoharan        7
Shailaja         8
Tonya            9

Note how the TotSFar analytic instruction is ORDER BY Name. This implicitly defines a sliding window that starts with the lowest name and ends with the current row:

NAME      TOTSOFAR
--------- --------
Antonio          1 -- COUNT(*) window is first row (Antonio)
                      through the current row (Antonio)
Donna            2 -- COUNT(*) window is first row (Antonio)
                      through the current row (Donna)
Jamila           3 -- COUNT(*) window is first row (Antonio)
                      through the current row (Jamila)
. . .
Tonya            9 -- COUNT(*) window is first row (Antonio)
                      through the current row (Tonya)

Other windows can be defined. This query shows employees in name order, with a running count in reverse:

SELECT
  Name,
  COUNT(*) OVER (
    ORDER BY Name
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS CountDown
FROM Employees
ORDER BY Name
NAME        COUNTDOWN
---------- ----------
Antonio             9
Donna               8
Jamila              7
Ji Ping             6
Joe                 5
Juan                4
Manoharan           3
Shailaja            2
Tonya               1

This isn’t a great real-world example, but it’s enough to introduce analytic windows as defined by row. The sliding window here is the current row through the end of the partition. There isn’t a PARTITION BY in this query, so the partition is the entire result set.

The windowing syntax is as follows:

ROWS BETWEEN window-start AND window-end

window-start can be any of the following:

  • UNBOUNDED PRECEDING: start the window at the beginning of the partition.
  • n PRECEDING: start the window n rows before the current row.
  • CURRENT ROW: start the window at the current row.
  • n FOLLOWING: start the window n rows after the current row.

window-end can be any of the following:

  • n PRECEDING: end the window n rows before the current row.
  • CURRENT ROW: end the window at the current row.
  • n FOLLOWING: end the window n rows after the current row.
  • UNBOUNDED FOLLOWING: end the window at the end of the partition.

Note that CURRENT ROW, n PRECEDING, and n FOLLOWING are valid for defining a window’s start and end.

UNBOUNDED PRECEDING can only be used to define a window’s start, and UNBOUNDED FOLLOWING can only be used to define a window’s end. A further restriction is that the window-end cannot come before the window-start.

This query uses a sliding window to show the average salary of the two employees before the current employee. The window is defined by the employee’s name:

SELECT
  Name,
  Salary,
  AVG(Salary) OVER (
    ORDER BY Name
    ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) AS Prior2Avg
FROM Employees
ORDER BY Name;
NAME      SALARY PRIOR2AVG
--------- ------ ---------
Antonio    51000           -- no prior employees so no average
Donna      46000     51000 -- Antonio is the only prior employee
                              so the average is Antonio's salary
Jamila     45000     48500 -- Antonio and Donna are the two prior
                              employees; the average of their
                              salaries is $48,500
Ji Ping    63000     45500 -- Donna and Jamila are the two prior
                              employees; the average of their
                              salaries is $45,500
Joe        40000     54000
Juan       75000     51500
Manoharan  60000     57500
Shailaja   67000     67500
Tonya      60000     63500

This query uses a sliding window to show how many employees are remaining in the list:

SELECT
  Name,
  COUNT(*) OVER (
    ORDER BY Name
    ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
  ) AS Remaining
FROM Employees
ORDER BY Name;
NAME      REMAINING
--------- ---------
Antonio           8 -- 8 names left: 1 FOLLOWING (Donna)
                       through UNBOUNDED FOLLOWING (Tonya)
Donna             7
Jamila            6
Ji Ping           5
Joe               4 -- 4 names left: 1 FOLLOWING (Juan)
                       through UNBOUNDED FOLLOWING (Tonya)
Juan              3
Manoharan         2
Shailaja          1
Tonya             0 -- no names left (nothing FOLLOWING Tonya)

Windows can stretch from any start and end point within the partition. They’re not limited to rows before or rows after the current row. You can define a window where all the rows are before the current row, all the rows are after, or some are before and some are after.

WINDOWING BY ROW: LAST_VALUE

The FIRST_VALUE and LAST_VALUE amalytic functions were introduced earlier, but only the FIRST_VALUE function was shown. That’s because both functions work within a window, and the default window is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The first row will always be in that window, but the last row won’t.

Here’s an attempt to have each row show the name of the person with the highest salary; note that the results are incorrect:

SELECT
  Name,
  Salary,
  LAST_VALUE(Name) OVER (ORDER BY Salary) AS HighSalName
FROM Employees
ORDER BY Salary;
NAME      SALARY HIGHSALNAM
--------- ------ ----------
Joe        40000 Joe        -- should be Juan, but LAST_VALUE looks
                               from UNBOUNDED PRECEDING (Joe) to
                               CURRENT ROW (also Joe)
Jamila     45000 Jamila     -- should also be Juan, but LAST_VALUE
                               looks from UNBOUNDED PRECEDING (Joe)
                               to CURRENT ROW (Jamila)
Donna      46000 Donna      -- should also be Juan
Antonio    51000 Antonio
Manoharan  60000 Tonya
Tonya      60000 Tonya
Ji Ping    63000 Ji Ping
Shailaja   67000 Shailaja
Juan       75000 Juan

The query is incorrect because Oracle won’t look beyond the current row. To fix it, define a window that will look through the end of the partition:

SELECT
  Name,
  Salary,
  LAST_VALUE(Name) OVER (
    ORDER BY Salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS HighSalName
FROM Employees
ORDER BY Salary;
NAME      SALARY HIGHSALNAM
--------- ------ ----------
Joe        40000 Juan
Jamila     45000 Juan
Donna      46000 Juan
Antonio    51000 Juan
Manoharan  60000 Juan
Tonya      60000 Juan
Ji Ping    63000 Juan
Shailaja   67000 Juan
Juan       75000 Juan

Note that ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING looks across the entire partition, but to find the highest salary the range only needs to start at the current row, not at the beginning of the partition. This query also calculates the “high salary name” correctly but it starts at the current row instead of the top of the partition. Because it uses a smaller window it should be more efficient:

SELECT
  Name,
  Salary,
  LAST_VALUE(Name) OVER (
    ORDER BY Salary
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS HighSalName
FROM Employees
ORDER BY Salary;

WINDOWING BY RANGE

Windowing by row works for many scenarios, but here’s a scenario where it can fail: find the average salary of the employee who makes less than the current employee. Here’s the first attempt:

SELECT
  Name,
  Salary,
  AVG(Salary) OVER (
    ORDER BY Salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS AvgMakingLess
FROM Employees
ORDER BY Salary;
NAME      SALARY AVGMAKINGLESS
--------- ------ -------------
Joe        40000
Jamila     45000         40000
Donna      46000         42500
Antonio    51000    43666.6667
Manoharan  60000         45500 -- PROBLEM
Tonya      60000         48400 -- PROBLEM
Ji Ping    63000    50333.3333
Shailaja   67000    52142.8571
Juan       75000         54000

The problem here is that Manoharan and Tonya have the same salary, so the average of employees making less than them should be the same. It isn’t the same here because the window goes by row, and the rows preceding Tonya include Manoharan, who makes the same salary. Once again, a tie in the ranking is causing trouble.

For scenarios like this, define windows by RANGE rather than ROW. RANGE defines an offset from the value in the ORDER BY rather than an explicit number of rows. Take this window definition as an example:

ORDER BY SALARY RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING

This means “from the beginning of the partition up to values 1 less than the current row’s salary. If salaries are in US dollars, the window ends after any salary one dollar less than the current row’s salary.

The proper way show the average salary of employees who make less than the current employee (assuming US dollars) is to define a range from UNBOUNDED PRECEDING to 0.01 PRECEDING. Why 0.01? Because that’s one cent, and anyone making one cent less than the current employee has a lower salary:

SELECT
  Name,
  Salary,
  AVG(Salary) OVER (
    ORDER BY Salary
    RANGE BETWEEN UNBOUNDED PRECEDING AND 0.01 PRECEDING) AS AvgMakingLess
FROM Employees
ORDER BY Salary;
NAME       SALARY AVGMAKINGLESS
---------- ------ -------------
Joe         40000
Jamila      45000         40000
Donna       46000         42500
Antonio     51000    43666.6667
Manoharan   60000         45500 -- now it's correct
Tonya       60000         45500 -- now it's correct
Ji Ping     63000    50333.3333
Shailaja    67000    52142.8571
Juan        75000         54000

The examples so far have used a constant value when using n PRECEDING or n FOLLOWING. The PRECEDING and FOLLOWING values can also be an expression. To show the number of employees whose salary is 75% or less than the current employee’s salary, define a range that ends after 25% of the current salary:

SELECT
  Name,
  Salary,
  COUNT(*) OVER (
    ORDER BY Salary
    RANGE BETWEEN UNBOUNDED PRECEDING AND Salary * 0.25 PRECEDING) AS Num75PctLess
FROM Employees
ORDER BY Salary;
NAME      SALARY NUM75PCTLESS
--------- ------ ------------
Joe        40000            0
Jamila     45000            0 -- range is from beginning
                                 to 33,750 (45,000 - 25%);
                                 no employees in this range
Donna      46000            0
Antonio    51000            0 -- range is from beginning
                                 to 38,250 (51,000 - 25%);
                                 no employees in this range
Manoharan  60000            2 -- range is from beginning
                                 to 45,000 (60,000 - 25%);
                                 two employees in this range
Tonya      60000            2
Ji Ping    63000            3 -- range is from beginning
                                 to 47,250 (63,000 - 25%);
                                 three employees in this range
Shailaja   67000            3
Juan       75000            4 -- range is from beginning
                                 to 56,250 (75,000 - 25%);
                                 four employees in this range

If a RANGE PRECEDING or RANGE FOLLOWING value is zero, that’s an offset of zero, which means “the same value”. This query shows how many employees make the same salary as the current employee:

SELECT
  Name,
  Salary,
  COUNT(*) OVER (
    ORDER BY Salary
    RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING) AS MakingSame
FROM Employees
ORDER BY Salary;
NAME      SALARY MAKINGSAME
--------- ------ ----------
Joe        40000          1 -- one employee makes 40,000: Joe
Jamila     45000          1
Donna      46000          1
Antonio    51000          1
Manoharan  60000          2 -- two employees make 60,000: Manoharan and Tonya
Tonya      60000          2
Ji Ping    63000          1
Shailaja   67000          1
Juan       75000          1

KEEP FIRST and KEEP LAST

KEEP FIRST and KEEP LAST will operate on rows whose ranking is first or last within a partition. That sounds like something that can be done without a new keyword, but what sets these apart is that the ranking is independent of the value being calculated.

This is another case where an example will be a lot more descriptive. First, let’s take a look at employee names, salaries, and hire dates:

SELECT
  Name,
  Salary,
  TO_CHAR(HireDate, 'MM/DD/YYYY') AS Hire
FROM Employees
ORDER BY HireDate;
NAME      SALARY HIRE
--------- ------ ----------
Donna      46000 06/03/2005
Juan       75000 09/25/2005
Tonya      60000 11/04/2008
Manoharan  60000 03/30/2009
Joe        40000 04/28/2009
Ji Ping    63000 12/14/2010
Shailaja   67000 06/22/2012
Jamila     45000 08/11/2012
Antonio    51000 10/08/2012

This query shows employees and their salaries, along with the average salary for employees hired in the minimum (lowest) year in the table. Note that the lowest hire year is 2005, and two employees were hired in that year.

SELECT
  Name,
  Salary,
  AVG(Salary) KEEP (
    DENSE_RANK FIRST ORDER BY EXTRACT(YEAR FROM HireDate))
    OVER() AS MinYearAvg
FROM Employees
ORDER BY Salary;
NAME      SALARY MINYEARAVG
--------- ------ ----------
Joe        40000      60500
Jamila     45000      60500
Donna      46000      60500
Antonio    51000      60500
Manoharan  60000      60500
Tonya      60000      60500
Ji Ping    63000      60500
Shailaja   67000      60500
Juan       75000      60500

KEEP FIRST and KEEP LAST work in conjunction with the DENSE_RANK function, and only with the DENSE_RANK function. They also require a partition. In the example, the partition is the entire result set it’s defined simply as OVER(). If the desired results were the average salary for the earliest hire year by department, the partition would be defined as OVER (PARTITION BY Department).

CONCLUSION (FOR NOW)

Given the length of this tutorial, it’s tempting to think it covers every aspect of analytic queries. But there’s more – lots more. In a future article I’ll cover other analytic query features, including how to show a cumulative percentage for values, how to use analytic values within an aggregating (GROUP BY) query, and how to calculate the time span between rows with a DATE or TIMESTAMP column.

2 thoughts on “Oracle Analytic Queries Tutorial

  1. xyz

    it is really nice and intersting tutotorial which you have posted but i have a question if i have assigned a rank function it sorts by partion but if there is a tie i would like to sort it by name what is you suggestion for this

    Reply
  2. ed@lakenine.com Post author

    Sorry for the late reply XYZ, but I’ve been crazy busy the past few weeks. If you want to break a tie, just add the tie-breaker to the function’s ORDER BY, for example RANK() OVER (ORDER BY Salary, Name) AS SalaryRank.

    I hope this answered your question. If not, please post back and I promise I’ll answer more quickly this time.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *