Calculating Time Spans Between Rows in Oracle

Oracle’s analytic functions are great for summarizing data, but that’s only part of their power. The LEAD and LAG analytic functions let you work with values from preceding or following rows, something that’s difficult – and often impossible – to do with non-analytic queries.

This post shows how to use the LAG function to calculate time spans between events in a log file.

First, here’s the create script for the table we’ll be using:

CREATE TABLE EventLog (
  EventType VARCHAR2(10),
  EventTime TIMESTAMP);

INSERT INTO EventLog VALUES ('Open', TIMESTAMP '2013-11-08 15:33:34.125');
INSERT INTO EventLog VALUES ('Close', TIMESTAMP '2013-11-08 15:36:57.266');
INSERT INTO EventLog VALUES ('Open', TIMESTAMP '2013-11-08 16:27:44.038');
INSERT INTO EventLog VALUES ('Close', TIMESTAMP '2013-11-08 16:29:09.407');
INSERT INTO EventLog VALUES ('Open', TIMESTAMP '2013-11-08 22:43:32.551');
INSERT INTO EventLog VALUES ('Open', TIMESTAMP '2013-11-09 07:53:18.919');
INSERT INTO EventLog VALUES ('Close', TIMESTAMP '2013-11-09 08:09:25.571');

COMMIT;

The LAG Function

The analytic LAG function can take up to three arguments:

  • LAG(expression) will give the value of expression calculated from the prior row. If there isn’t a prior row, LAG will return null.
  • LAG(expression, number-of-rows) will give the value of expression calculated from the row that’s number-of-rows before the current row. If there isn’t a row that’s number-of-rows before the current row, LAG will return null.
  • LAG(expression, number-of-rows, default-value) will give the value of expression from number-of-rows prior, or if there isn’t a row number-of-rows before the current row it will return default-value instead.

The prior row is defined by the OVER() clause of the function. This posting has multiple examples of the OVER() clause.

Using LAG to Get a Prior Value

This example shows how to get the EventTime timestamp from the row before the current row. The “row before” is determined by the EventTime:

SELECT
  EventType,
  EventTime,
  LAG(EventTime) OVER (ORDER BY EventTime) AS PriorRowEventTime
FROM EventLog
ORDER BY EventTime;
EVENTTYPE EVENTTIME                     PRIORROWEVENTTIME
--------- ----------------------------  --------------------------
Open      08-NOV-13 03.33.34.125000 PM
Close     08-NOV-13 03.36.57.266000 PM  08-NOV-13 03.33.34.125000 PM
Open      08-NOV-13 04.27.44.038000 PM  08-NOV-13 03.36.57.266000 PM
Close     08-NOV-13 04.29.09.407000 PM  08-NOV-13 04.27.44.038000 PM
Open      08-NOV-13 10.43.32.551000 PM  08-NOV-13 04.29.09.407000 PM
Open      09-NOV-13 07.53.18.919000 AM  08-NOV-13 10.43.32.551000 PM
Close     09-NOV-13 08.09.25.571000 AM  09-NOV-13 07.53.18.919000 AM

Note that the PriorRowEventTime for each row is the EventTime value for the row immediately preceding it. The OVER (ORDER BY EventTime) clause says to get the prior value as if the list were ordered by EventTime.

Coincidentally, the last line of the query is ORDER BY EventTime. This clause specifies the display order for the results, not the row order for determining LAG values.

Use the second LAG parameter to get the EventTime for more than one row before the current row. This example gets the EventTime timestamp from two rows before the current row; the only difference between this and the previous query is the additional parameter to the LAG function:

SELECT
  EventType,
  EventTime,
  LAG(EventTime, 2) OVER (ORDER BY EventTime) AS PriorRowEventTime
FROM EventLog
ORDER BY EventTime;
EVENTTYPE EVENTTIME                     PRIORROWEVENTTIME
--------- ----------------------------  --------------------------
Open      08-NOV-13 03.33.34.125000 PM
Close     08-NOV-13 03.36.57.266000 PM
Open      08-NOV-13 04.27.44.038000 PM  08-NOV-13 03.33.34.125000 PM
Close     08-NOV-13 04.29.09.407000 PM  08-NOV-13 03.36.57.266000 PM
Open      08-NOV-13 10.43.32.551000 PM  08-NOV-13 04.27.44.038000 PM
Open      09-NOV-13 07.53.18.919000 AM  08-NOV-13 04.29.09.407000 PM
Close     09-NOV-13 08.09.25.571000 AM  08-NOV-13 10.43.32.551000 PM

In this case, the first two rows of the result don’t have a PriorEventTime value. That makes sense because there isn’t a row that’s two rows before the first or second row.

To return a default value when there isn’t a prior row or rows, use the third LAG parameter. This example is the same as the previous one except for the LAG default value, which is the beginning of the day for the current row’s EventTime:

SELECT
  EventType,
  EventTime,
  LAG(EventTime, 2, TRUNC(EventTime)) OVER (ORDER BY EventTime) AS PriorRowEventTime
FROM EventLog
ORDER BY EventTime;
EVENTTYPE EVENTTIME                     PRIORROWEVENTTIME
--------- ----------------------------  --------------------------
Open      08-NOV-13 03.33.34.125000 PM  08-NOV-13 12.00.00.000000 AM
Close     08-NOV-13 03.36.57.266000 PM  08-NOV-13 12.00.00.000000 AM
Open      08-NOV-13 04.27.44.038000 PM  08-NOV-13 03.33.34.125000 PM
Close     08-NOV-13 04.29.09.407000 PM  08-NOV-13 03.36.57.266000 PM
Open      08-NOV-13 10.43.32.551000 PM  08-NOV-13 04.27.44.038000 PM
Open      09-NOV-13 07.53.18.919000 AM  08-NOV-13 04.29.09.407000 PM
Close     09-NOV-13 08.09.25.571000 AM  08-NOV-13 10.43.32.551000 PM

Calculate a Simple Time Span Between Rows

The first example simply shows the time between an event and its prior event. It doesn’t do anything special with the event type.

SELECT
  EventType,
  EventTime,
  EventTime - LAG(EventTime) OVER (ORDER BY EventTime) AS Elapsed
FROM EventLog;
EVENTTYPE EVENTTIME                     ELAPSED
--------- ----------------------------  --------------------------
Open      08-NOV-13 03.33.34.125000 PM
Close     08-NOV-13 03.36.57.266000 PM  +000000000 00:03:23.141000
Open      08-NOV-13 04.27.44.038000 PM  +000000000 00:50:46.772000
Close     08-NOV-13 04.29.09.407000 PM  +000000000 00:01:25.369000
Open      08-NOV-13 10.43.32.551000 PM  +000000000 06:14:23.144000
Open      09-NOV-13 07.53.18.919000 AM  +000000000 09:09:46.368000
Close     09-NOV-13 08.09.25.571000 AM  +000000000 00:16:06.652000

Note that the first row doesn’t have a value for ELAPSED. That’s because there isn’t a row before the first row. In this case, the LAG function returns null, and if a mathematical expression has one or more null operands its result is always null.

The ELAPSED column is an INTERVAL DAY TO SECOND value, which represents a time span in days, accurate to fractional seconds.

An Aside: INTERVAL Data Types

Time for a quick aside. The ELAPSED column is an INTERVAL DAY TO SECOND value, which makes it easy to pull the day, hour, minute, and second values using the EXTRACT function. For example, assume an INTERVAL DAY TO SECOND column named ProcessDuration:

SELECT
  ProcessDuration,
  EXTRACT(DAY FROM ProcessDuration) AS "Days",
  EXTRACT(HOUR FROM ProcessDuration) AS "Hours",
  EXTRACT(MINUTE FROM ProcessDuration) AS "Minutes",
  EXTRACT(SECOND FROM ProcessDuration) AS "Seconds"
FROM Processes;
PROCESSDURATION     Days Hours Minutes Seconds
------------------- ---- ----- ------- -------
+01 05:37:41.664000    1     5      37  41.664

When a TIMESTAMP is subtracted from another TIMESTAMP, the result is always an INTERVAL DAY TO SECOND value. The same is true when a TIMESTAMP is subtracted from a DATE or when a DATE is subtracted from a TIMESTAMP.

When a DATE is subtracted from a DATE, the result is a numeric. The value of the numeric is the number of days. It can have a decimal component representing fractions of days – for example 1.5 means 1 1/2 days. You can pull hours, minutes and seconds from this value, but it takes a bit of math. It would be a lot easier if the DATE - DATE result were an INTERVAL DAY TO SECOND value. Luckily, the NUMTODSINTERVAL function will convert the numeric to an INTERVAL DAY TO SECOND value:

SELECT
  EXTRACT(DAY FROM NUMTODSINTERVAL(1.5, 'DAY')) AS "Days",
  EXTRACT(HOUR FROM NUMTODSINTERVAL(1.5, 'DAY')) AS "Hours"
FROM DUAL;
Days Hours
---- -----
   1    12

The NUMTODSINTERVAL function takes two arguments: a numeric value and the units it represents. The sample query below returns 11/2 days four different ways: for 1.5 days, for 36 hours, for 2,160 minutes, and 129,600 seconds (note that the output has most leading and trailing zeros removed so it will fit neatly on one line):

SELECT
  NUMTODSINTERVAL(1.5, 'DAY') AS "FromDays",
  NUMTODSINTERVAL(36, 'HOUR') AS "FromHours",
  NUMTODSINTERVAL(2160, 'MINUTE') AS "FromMinutes",
  NUMTODSINTERVAL(129600, 'SECOND') AS "FromSeconds"
FROM DUAL;
FromDays         FromHours        FromMinutes      FromSeconds
---------------- ---------------- ---------------- ----------------
+01 12:00:00.000 +01 12:00:00.000 +01 12:00:00.000 +01 12:00:00.000

That’s the end of the aside; now back to analytic functions and the time spans between rows.

Supplying a Default LAG Value

This section demonstrates the LAG function’s default-value parameter. The default value can be any expression; this example uses the beginning of the day based on the current row. The lag is for a single row, and because the third default-value parameter is used, the default lag value of 1 must be specified because it’s the second parameter:

SELECT
  EventType,
  EventTime,
  EventTime - LAG(EventTime, 1, TRUNC(EventTime)) OVER (ORDER BY EventTime) AS Elapsed
FROM EventLog;
EVENTTYPE EVENTTIME                     ELAPSED
--------- ----------------------------  --------------------------
Open      08-NOV-13 03.33.34.125000 PM  +000000000 15:33:34.125000
Close     08-NOV-13 03.36.57.266000 PM  +000000000 00:03:23.141000
Open      08-NOV-13 04.27.44.038000 PM  +000000000 00:50:46.772000
Close     08-NOV-13 04.29.09.407000 PM  +000000000 00:01:25.369000
Open      08-NOV-13 10.43.32.551000 PM  +000000000 06:14:23.144000
Open      09-NOV-13 07.53.18.919000 AM  +000000000 09:09:46.368000
Close     09-NOV-13 08.09.25.571000 AM  +000000000 00:16:06.652000

The first row now has an ELAPSED value. The value is 15 hours, 33 minutes, and 34.125 seconds, or the time span since midnight on November 13, 2013 (the TRUNCated value of the first row’s timestamp) and the event time of November 13, 2013 at 3:33:34.125. Put another way, the third argument of TRUNC(EventTime) refers to the EventTime of the current row.

Time Spans by Event Type

The examples so far have defined the LAG "window" as the entire table. As with most other analytic functions, the LAG window can be further refined by partitioning. This example limits the prior EventTime value to rows which have the same EventType:

SELECT
  EventType,
  EventTime,
  EventTime - LAG(EventTime) OVER (
    PARTITION BY EventType
    ORDER BY EventTime) AS PriorEventTime
FROM EventLog;

In this case, the prior EventTime comes from the row that has the most-recent EventTime and the same EventType. That’s why two of the rows have a null PriorEventType – each has the earliest EventTime for its type (Close or Open):

EVENTTYPE EVENTTIME                    PRIOREVENTTIME
--------- ---------------------------- --------------------------
Close     08-NOV-13 03.36.57.266000 PM
Close     08-NOV-13 04.29.09.407000 PM +000000000 00:52:12.141000
Close     09-NOV-13 08.09.25.571000 AM +000000000 15:40:16.164000
Open      08-NOV-13 03.33.34.125000 PM
Open      08-NOV-13 04.27.44.038000 PM +000000000 00:54:09.913000
Open      08-NOV-13 10.43.32.551000 PM +000000000 06:15:48.513000
Open      09-NOV-13 07.53.18.919000 AM +000000000 09:09:46.368000

The LEAD Analytic Function

The examples in this post all use the LAG function, which gets values for prior rows. To get values for trailing rows, use the LEAD function. Its parameters are the same, and its behavior is the same except it operates on rows that follow the current row instead of rows that precede it.

I’ll wrap up with a quick LEAD example. Here’s the first query in this post, but with LEAD instead of LAG:

SELECT
  EventType,
  EventTime,
  LEAD(EventTime) OVER (ORDER BY EventTime) AS NextRowEventTime
FROM EventLog
ORDER BY EventTime;

The results of this query are:

EVENTTYPE EVENTTIME                    NEXTROWEVENTTIME
--------- ---------------------------- ----------------------------
Open      08-NOV-13 03.33.34.125000 PM 08-NOV-13 03.36.57.266000 PM
Close     08-NOV-13 03.36.57.266000 PM 08-NOV-13 04.27.44.038000 PM
Open      08-NOV-13 04.27.44.038000 PM 08-NOV-13 04.29.09.407000 PM
Close     08-NOV-13 04.29.09.407000 PM 08-NOV-13 10.43.32.551000 PM
Open      08-NOV-13 10.43.32.551000 PM 09-NOV-13 07.53.18.919000 AM
Open      09-NOV-13 07.53.18.919000 AM 09-NOV-13 08.09.25.571000 AM
Close     09-NOV-13 08.09.25.571000 AM

Note that this time around, the last row has a null “other row” value. That’s because it doesn’t have a following row. As with the LEAD function, you can define a default value for this situation using the third function parameter.

Conclusion

The examples in this post calculate time differences between the current row and previous or following rows, but you’re not limited to time values. You can calculate numeric differences as well, for example the difference between gross sales in a list of products or the difference in GPA in a list of students.

Calculated values like this used to be outside the database domain – it was common to query just the raw numbers and have the front-end software calculate differences between rows. With analytic queries, these values can be brought back to the database, ensuring enterprise-wide consistency.

Leave a Reply

Your email address will not be published. Required fields are marked *