Oracle 12c Learning Series: In-database Archiving and Temporal Validity

One of the major challenge faced by an Oracle DBA is – how to effectively deal with historical data? Today, if we consider database tables for an enterprise, data in the table goes back several years and most of the data in the table is inactive. Challenge remains as to how to archive this data and make our query run efficiently.

If we choose to archive old data outside of database on a tape, cost of storing data reduces but older data is not available for one-off reports. Also regulatory compliance suggests that older historical data should be accessible all the time.

If we choose to maintain old data inside database, size of database grows, so is the size of backup and storage cost increases. Not only that, queries on bigger tables does not run efficiently, so there is performance cost associated with queries, DMLs and DDLs (example index creation and rebuild).

In 11g, older archived data is handled using following 2 solutions

  1. Partitioning – Distinguish the data as old or new based on the date of creation or some date attribute in the table and partition the table according to that column. DBA can manually move older partitions to low cost storage
  2. Flashback data archive (total recall feature in 11g) – This feature automatically tracks changes to data over period of time and maintains archive transactional data in a different tablespace based on RETENTION parameter. Historical data can be queried using flashback query AS OF clause. Archived transactional data which has aged beyond RETENTION will be purged automatically.

Oracle 12c provide additional solutions to above challenges. We have 2 ways to handle situation for archive data in Oracle 12c:

  1. In-database archiving
  2. Temporal validity

In-Database Archiving

Until now, it was very difficult to identify active data from in-active data at row level. In Oracle 12c, we can identify active data vs inactive data at row level. This is possible by attaching a property to every row of a table to mark it as in-active. If the row is not marked inactive, it’s an active row. Setting such property at row level helps separate active and non-active data in a table and lets you archive rows in a table by marking them inactive. The key thing to understand here is that the data remains in the table and you can compress it, but to the applications, this part of the data (inactive) remains invisible. We also have a session level parameter which enables us to see active as well as inactive data (if required).

In-database archiving is enabled at table level, either during creation of table or later by altering the table. Following is an example of creating table with in-database archiving


SQL> create table sales_archive
 2 (year    number(4),
 3 product  varchar2(10),
 4 amt      number(10,2))
 5 row archival;

Table created.

When you create a table with “ROW ARCHIVAL” clause (line 5 in above example), it automatically adds an extra column ORA_ARCHIVE_STATE to the table. But this column is added as hidden column. If you try to describe the table, you won’t see this column.


SQL> desc sales_archive
 Name                    Null?    Type
 ----------------------- -------- ----------------
 YEAR                             NUMBER(4)
 PRODUCT                          VARCHAR2(10)
 AMT                              NUMBER(10,2)

But you can check if the column is added or not using DBA_TAB_COLUMNS view.


SQL> select table_name, column_name, column_id, HIDDEN_COLUMN from dba_tab_cols where table_name = 'SALES_ARCHIVE';

TABLE_NAME           COLUMN_NAME          COLUMN_ID  HID
-------------------- -------------------- ---------- ---
SALES_ARCHIVE        ORA_ARCHIVE_STATE               YES
SALES_ARCHIVE        YEAR                          1 NO
SALES_ARCHIVE        PRODUCT                       2 NO
SALES_ARCHIVE        AMT                           3 NO

Similarly, you can enable an existing table for row archival using alter table as shown below.


SQL> alter table sales row archival;

Table altered.

SQL> select table_name, column_name, column_id, HIDDEN_COLUMN from dba_tab_cols where table_name = 'SALES';

TABLE_NAME           COLUMN_NAME          COLUMN_ID  HID
-------------------- -------------------- ---------- ---
SALES                YEAR                          1 NO
SALES                PRODUCT                       2 NO
SALES                AMT                           3 NO
SALES                ORA_ARCHIVE_STATE               YES

ORA_ARCHIVE_STATE column is very important column for distinguishing active data vs inactive data. The column ORA_ARCHIVE_STATE can take two values—0 and 1. By default, a newly inserted row is active and is denoted by the value 0 for the ORA_ARCHIVE_STATE column. When the rows start being rarely accessed and not updated any longer, they’re considered to be in the non-active state and are denoted by the value 1 (or any value other than zero) for the ORA_ARCHIVE_STATE column.

If you want to check the value of ORA_ARCHIVE_STATE column, you have to select the column explicitly in your select statement.


SQL> select year, product, amt, ORA_ARCHIVE_STATE from sales_archive;

      YEAR PRODUCT           AMT ORA_ARCHIV
---------- ---------- ---------- ----------
      2001 A                 100 0
      2002 B                 200 0
      2003 C                 300 0
      2004 D                 400 0
      2005 E                 500 0
      2006 F                 600 0
      2007 G                 700 0

7 rows selected.

As you can see, we have value of 0 in all the selected rows, which means all this data is active.

Note that if rows are rarely accessed, column ORA_ARCHIVE_STATE will not be set to value of 1 automatically. A DBA or owner of the data have to decide which rows are not being accessed and are inactive and they have to manually update the value of ORA_ARCHIVE_STATE column.

Following DML shows how to mark the data inactive

SQL> update sales_archive set ORA_ARCHIVE_STATE = DBMS_ILM.ARCHIVESTATENAME(1) where year = 2001;

1 row updated.

SQL> commit;

Commit complete.

In the above DML, we used DBMS_ILM.ARCHIVESTATENAME function. This function is officially used for updating value of ORA_ARCHIVE_STATE column. You can also directly update this column to any non-zero value and result will be same. In the above DML, we updated 1 record of year 2001 and marked it as inactive (or archived).

If you query the table again, you won’t see the record that we updated above.


SQL> select year, product, amt, ORA_ARCHIVE_STATE from sales_archive;

      YEAR PRODUCT           AMT ORA_ARCHIV
---------- ---------- ---------- ----------
      2002 B                 200 0
      2003 C                 300 0
      2004 D                 400 0
      2005 E                 500 0
      2006 F                 600 0
      2007 G                 700 0

6 rows selected.

This is because, by default Oracle only shows active records. Any records marked inactive will not be shown. This is done by Oracle by adding simple filter to your queries. We can see the same in below explain plan

SQL> explain plan for
 2 select year, product, amt, ORA_ARCHIVE_STATE from sales_archive;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------
Plan hash value: 4043476784

----------------------------------------------------------------------------------
| Id | Operation         | Name          | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT   |               | 6    | 12210 | 3 (0)       | 00:00:01 |
|* 1 | TABLE ACCESS FULL | SALES_ARCHIVE | 6    | 12210 | 3 (0)       | 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("SALES_ARCHIVE"."ORA_ARCHIVE_STATE"='0')

Note
-----
 - dynamic statistics used: dynamic sampling (level=2)

17 rows selected.

You can view archived records again by setting row archival visibility parameter to ALL at session level. By default, the value of this parameter is ACTIVE, that’s why we are seeing only active records. Follow shows the same


SQL> alter session set row archival visibility = ALL;

Session altered.

SQL> select year, product, amt, ORA_ARCHIVE_STATE from sales_archive;

YEAR       PRODUCT           AMT ORA_ARCHIV
---------- ---------- ---------- ----------
      2001 A                 100 1
      2002 B                 200 0
      2003 C                 300 0
      2004 D                 400 0
      2005 E                 500 0
      2006 F                 600 0
      2007 G                 700 0

7 rows selected.

After setting row archival visibility to ALL, we are able to see archive values. This is one great advantage of in-database archiving. If some adhoc report needs to be run, which needs access to archived records, we can just set row archival visibility to ALL at the beginning of the report and we should be able to get the desired result. You can set back visibility to ACTIVE again.

SQL> alter session set row archival visibility = ACTIVE;

Session altered.

You can also restore back archived data as active data by running update statement on archived data and using DBMS_ILM.ARCHIVESTATENAME function. But in this case you have to pass 0 as value to this function. Also, you need to set ROW ARCHIVAL VISIBILITY clause to ALL before you run update else archived data won’t be visible.

You can disable row archival for a table by using ALTER TABLE … NO ROW ARCHIVAL. If you disable row archival for a table, ORA_ARCHIVE_STATE column will get dropped immediately and all the data will be visible in your select statement

SQL> alter table sales no row archival;

Table altered.

Note that if you use CTAS (CREATE TABLE AS SELECT) to create a new table from an existing table, then new table will have active as well as inactive data. In-database archiving is only applicable to selects and DMLs but not to DDLs.

Temporal Validity and Temporal history

In in-database archiving, we can distinguish active rows from non-active rows using row archival column (ORA_ARCHIVE_STATE). You can achieve same functionality using temporal validity. Many of the applications today have their data dependent on various dates that are relevant to underlying business. Example an insurance application will have basic dates recorded for insurance start date and insurance end date. This represents valid time when the insurance policy will be active for a customer. These dates attributes which denotes valid time for record is called temporal validity. Temporal Validity lets you keep active and inactive data together in the same table, while providing all the benefits of archiving non-active data. Temporal validity support helps in cases where it’s critical to know when certain data becomes valid and when it’s invalid.

Similar to business relevant date attributes used in table by application, Oracle provide temporal validity by creating its own date attributes (start date and end date) to represent valid time dimensions when the record will be active. Application user or DBA can decide which records are active vs inactive by updating start date and end date attribute of each record. Records for which end date has passed are inactive records.

By using valid time temporal implicit filter on valid-time dimension, queries can show rows that are currently valid or that will be valid in future. Queries can hide rows whose facts are currently not valid.

Temporal History

Before we go further, it’s better to understand about temporal history and how temporal validity is different than temporal history. Temporal history was called as Flashback Data Archive (FDA) in previous release. To understand this, let’s consider this.

Temporal validity dates and times are different than the dates and times when the record is created. The date and time when record was created in database is called temporal history. Example, if we have to create a new customer whose insurance policy starts from today, it’s possible that we might create a record for that application tomorrow or after a week but put insurance start date of today. In this case temporal validity date is of today but since record is created 1 week later, temporal history date will be 1 week later.

We can use temporal history to get the past data. We can also use temporal history along with temporal validity to get rows which were valid in the past.

Defining temporal validity

You define a valid-time dimension at table creation time, or by altering a table. In order to create the valid-time dimension, you specify the PERIOD FOR clause in the table creation statement.

The following example shows how to explicitly define two date-time columns, USER_TIME_START and USER_TIME_END:

Following example creates table with valid-time dimension

Explicitly specifying valid-time dimension columns.


SQL> create table policy
2 (cust_id         number,
3 policy_no        number,
4 policy_type      varchar2(20),
5 user_time_start  date,
6 user_time_end    date,
7 period for user_time (user_time_start, user_time_end));

Table created.

In above example, we explicitly specified valid-time dimension columns in create table statement. But even if we don’t specify the column names, valid-time temporal creates desired columns


SQL> create table policy
2 (cust_id    number,
3 policy_no   number,
4 policy_type varchar2(20),
5 period for user_time);

Table created.

A valid-time dimension represented by new PERIOD FOR clause, consist of two date-time columns. If you don’t specify the columns explicitly as in second example, Oracle automatically creates 2 hidden columns whose name starts with prefix provided in PERIOD FOR clause and ends with “start” and “end”. So in second example, the 2 hidden columns created would be user_time_start and user_time_end (same as in example 1).

Only difference between first example and second one is the visibility of valid-time dimension columns. In first table, the columns are visible as we explicitly created them, whereas in second example, the columns are hidden.

If the columns are created hidden, you need to explicitly specify them in select, updates and inserts. Following example shows an insert statement into policy table where we created implicit valid-time dimension columns


SQL> insert into policy (cust_id, policy_no, policy_type, user_time_start, user_time_end) values (123, 3424, 'HEALTH',sysdate, null);

1 row created.

SQL> select * from policy;

   CUST_ID POLICY_NO  POLICY_TYPE
---------- ---------- --------------------
       123 3424       HEALTH

SQL> select cust_id, policy_no, policy_type, user_time_start from policy;

   CUST_ID POLICY_NO  POLICY_TYPE          USER_TIME_START
---------- ---------- -------------------- ----------------------------------------
       123 3424       HEALTH               20-JUN-16 12.33.53.000000 AM +05:30

As you can see, using “select *” won’t show hidden valid-time dimension columns. In 2nd statement we explicitly select them and see the values.

Now, we have seen how to create valid-time columns in table to separate active data from inactive data. Let’s check how to filter these valid-time columns and select only valid data. There are 2 methods of selecting active data by filtering on valid-time columns

  • PERIOD FOR clause
  • Using DBMS_FLASHBACK_ARCHIVE procedure

PERIOD FOR clause

To filter valid data, you can use SELECT statement with new PERIOD FOR clause. We have a set of data which is valid based on its start date and end date of valid-time temporal and we have another set of data in same table which is invalid as it falls outside the start time and end time of valid-time temporal. Both sets of rows reside in same table. However, by controlling visibility of data to valid rows, you can limit what queries and DMLs to see only active data.

For each record that you insert into table, you specify valid-time start date and end date. These dates represent the activeness of data. These dates are entered manually by end users. The date when the record ins actually inserted into the table is the transaction time and is called temporal history in Oracle 12c.

You can use either AS OF PERIOD FOR clause or VERSIONS PERIOD FOR clause to display valid data. AS OF PERIOD FOR clause is used when we want to see valid data as on specific date. Whereas VERSIONS PERIOD FOR clause is used when we want to see valid data between 2 dates (range).

Following example displays all policies that were active as on 01-Jan-2016

select * from policy AS OF PERIOD FOR user_time to_date('01-JAN-2016','DD-MON-YYYY');

Here is another example which show valid data between 01-JAN-2016 and 01-FEB-2016.

select * from policy VERSIONS PERIOD FOR user_time between to_date('01-JAN-2016','DD-MON-YYYY') and to_date(’01-FEB-2016’,’DD-MON-YYYY’);

DBMS_FLASHBACK_ARCHIVE

You can also use DBMS_FLASHBACK_ARCHIVE procedure to show valid data. This procedure is used to set visibility of data at session level. Once visibility is set, select statement on valid-time temporal table shows data that was valid data at set time.

In following example, we are setting visibility of data as of given time – 01-JAN-2016
This means any select statement or DMLs in the same session, will see valid data as of 01-JAN-2016


SQL> exec DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME('ASOF',to_date('01-JAN-2016','DD-MON-YYYY'));

PL/SQL procedure successfully completed.

You can set the visibility to current time. This will show data which is currently valid


SQL> exec DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME(‘CURRENT'));

PL/SQL procedure successfully completed.

Note that this visibility setting affects only selects and DMLs. It doesn’t affect DDL. DDL sees complete data (valid as well as invalid). Also, in-database archiving and temporal validity is NOT supported in CDB database. It’s only supported in non-CDB database.

Hope this helps !!

4 thoughts on “Oracle 12c Learning Series: In-database Archiving and Temporal Validity

  1. Can you please share details how to manage child records for the parent rows via a link table.
    What will be primary key in main table and foreign key in the link table?

  2. Helpful article.
    Can you please share details of how to manage child rows of tables having temporal validity columns via a link table?
    What will be primary and foreign key in main table and link table?

Leave a comment