Oracle Data Pump Technology – Oracle Database 10g

Introduction:

Oracle datapump technology refers to the new import and export functionality introduced in Oracle database 10g. New import/export technology is far superior then the traditional import/export tools available until Oracle database 9i. Data pump manages multiple, parallel stream of data to achieve maximum throughput.

Oracle claims that Data Pump allows you to decrease the total export time by two orders of magnitude in most data intensive export jobs. Imports are supposed to run 15-20 times faster then traditional method.

While Oracle continues to support the traditional import utility forever, the traditional export utility will get deprecated eventually.

Benefits of using Data Pump:

Following are some of the benefits of using Data Pump

Server based: Data Pump jobs are server based. Mean if you submit the job from the client and you close the client connection the job will continue to run on server.
Ability to restart data pump job: We can attach to existing job which is stalled or failed because of lack of space and start the job.
Parallel Execution capabilities: Data Pump provide PARALLEL parameter which enables you to provide number of parallel workers
Ability to attach to running job: Data Pump provide you capability to attach to running job and change parameter as required
Network mode of operation: Ability to import the database from some other database without having to create a dump file
Fine-grained data import capability: Provides finer control on which objects to import
Remapping Capabilities: Enhancement to traditional TOUSER parameter. You have the capability to remap tablespace, schemas and datafiles
Ability to estimate space requirement: You can estimate the space without even running the export job.

Data Pump Components

We have following data pump API available for use in the program

DBMS_DATAPUMP
DBMS_METADATA

DBMS_DATAPUMP is a package which has the actual data pump engine to export and import data from the database. DBMS_METADATA is used to extract and upload the metadata about data. These are the API packages and can be used directly in a program. Other then this the client binaries like expdp and impdp are also available.

DIRECTORY Object:

One of the benefit of using data pump is that the job runs on the server side unlike the traditional import/export programs. In the traditional import/export programs, Oracle used to make simple client connection to the database and extract the data. The process for import/export in the traditional method used to run on client machine. Meaning that if a job is started using the traditional export and import method, the dump file will get created at a place or machine on which you started the job.

Data pump is different. In that the job always runs on the server where the database resides and job does not make a simple client connection and use SQL to extract the data. The mechanism for data pump is different then traditional technology. The mechanism is explained in the next section of this post. For now the important thing to understand here is that the job runs on the sever side and the dump files also gets created on the server side. So how do we know the location for this dump file. Where exactly will it get created on the server?

To answer this we have a parameter called DIRECTORY. This is a new object introduced in Oracle database 10g. Using DIRECTORY parameter we can specify the location of dump file.

Before using DIRECTORY parameter in expdp or impdp, we need to first create this object in the database. Following is the way to create directory object in database.

SQL> Create directory test_dir as ‘/u01/app/oracle/exp_dump’ ;

Here the location /u01/app/oracle/exp_dump is the location on server and must exists. test_dir is the directory object that we created in the database.

In order for the user to use a directory object, you need to grant privilege to the user on that directory.

SQL> grant read, write on directory test_dir to avdeo;

While doing the export or import there are 3 ways to provide this directory object.

1) Using DIRECTORY parameter

You can specify the DIRECTORY parameters and all the files will get created or read from the that directory. Example

$ expdp hr/hr DIRECTORY=test_dir DUMPFILE=…

2) Using DIRECTORY:FILE notation

You use DIRECTORY:FILE notation if you want to specify different direct for dump file, log file etc. Example

$ expdp hr/hr DUMPFILE=test_dir:exp_file01.dmp LOGFILE=test_dir1:logfile01.log

3) Using DATA_DUMP_DIR environment variable

You can also set DATA_DUMP_DIR env variable to a directory object name and same will be used by the expdp and impdp tool. Example

$ export DATA_DUMP_DIR=test_dir

$ expdp hr/hr DUMPFILE=test.dmp …

Order of precedence for the above notation goes as follows. Oracle will first give the precedence to DIRECTORY:FILE notation (point #2). If you have submitted job giving DIRECTORY:FILE notation then oracle will ignore other notation. Second precedence is giving to DIRECTORY parameter. Oracle will look for DATA_DUMP_DIR environment variable only of you have not used DIRECTORY parameter or DIRECTORY:FILE notation.

Mechanics of Data Pump technology

Data Pump uses several processes to perform the job. This include the master process, the worker process, shadow process and the client process. Lets check out each of these processes

The master process

The master process or more accurately the Master Control Process (MCP) is the primary process for managing data pump job. MCP is a background process with a full name as <instance>_DMnn_<pid> and this process can be seen when a import or export job starts. There is 1 MCP process per job. If you are running 3 import or export job, you will see 3 such master process in that instance. The master process performs following tasks:

Create jobs and control them
Creates and manages worker process
Monitors the job and logs the progress
Maintains the job state and restart information
Manages necessary files, including the dump file set

When a job is submitted master process creates a table called mater table to log the location of various database objects in export dump file. Master table is the heart of every data pump export and import job. Master process maintains the job state and restart information in master table. Oracle creates master table in the schema of use who is running the data pump job at the beginning of export. Master table contains various set of information regarding the state of the running job, the location of various objects in export dump file, parameters of the job and status of all worker process. Master table has the same name as the name of the job. You can provide the name to the job you are submitting or by default the name of the job will be <schema>_<operation>_<mode>_<unique count>. Example SYS_EXPORT_SCHEMA_01

At the end of export, data pump export job will export the master table and drop it from the schema of the job is successful. If the job fails the master table will remain with all the information about the point of failure. Later on one can just attach to the job and restart the failed job after fixing the issue. Also master table will get deleted if you kill the job using KILL_JOB command.

Master process will also recreate the master table that gets saved at the end of export as the first thing while starting the import. The import process gets complete information about the dumps and objects its going to import using the master table. If the import is successful or forcefully killed by the user, the master table is dropped. If the import job fails or stopped by the user using STOP_JOB command, the master table remains in the database schema.

The worker process

Work process is the one who does the heavy duty work of actual loading and unloading operations. Worker process has the name of the process as <instance>_DWnn_<pid>. Number of worker process depends on the degree of parallelism as defined by the parameter PARALLEL. As the worker process exports and imports the objects they update the status of the same in master table. They update the master table with information about the status of various jobs: completed, pending or failed.

The Shadow process

When a client logs into the database server, the database creates an Oracle background foreground process to service data pump API request. The shadow process creates the job consisting of master table as well as master process. Once client detaches shadow process automatically disappears.

The Client Process

Client process is the one which calls data pump API. You have 2 client- expdp and impdp.

Interactive Data Pump

Running the export and import job in interactive mode is totally different concept in case of traditional import and export and data pump. In case of traditional import and export job when we give the export or import command without giving any parameters, the API will ask for some required mandatory parameters which we used to input and job used to start. But in case of data pump the meaning of interactive job is different. In case of data pump the API never ask you for any inputs of the parameters one by one. Only way you have to submit the job is to give all the parameters, either inline with job or in a parameter file.

Then what is interactive method in case of data pump?

Well the interactive method is used in data pump only when you decide to make some changes to the parameters of the already running job.

The way to get into the interactive mode is either by doing ctrl+c or using ATTACH parameter. Example

expdp hr/hr ATTACH=test_job_01

Data Pump Export Parameters

Following is the brief list of parameters that can be passed with data pump export job. For complete list you can do expdp help=y

DIRECTORY – Used to specify the location for dump file or logfile or SQL file.

DUMPFILE – Used to specify the name of dump file. And yes it replaces the FILE parameter in traditional export

FILESIZE – Used to specify the size of the dump file that it should create. If the size exceeds, then it will write the dump in another file if you have given multiple filenames in DUMPFILE parameter or used %U while giving the name of dump file.

PARFILE – Used to provide the name and location of parameter file which has all the required parameter for the export job to run

LOGFILE and NOLOGFILE – You can specify LOGFILE parameter to to create log file for export job. If you dont specify LOGFILE parameter, Oracle will still create the log file named export.log. Subsequent export jobs will overwrite it as this is the default name. You can specify NOLOGFILE parameter if you want to totally avoid creation of log file for your export job.

CONTENT – Using content parameter, you can filter what goes into the dump file. CONTENT parameter can take 3 values: ALL, DATA_ONLY, METADATA_ONLY

EXCLUDE and INCLUDE – Used for fine grained filtering of the objects that can go in export dump file

QUERY – Used to control the rows of the table that can go into the export dump file

ESTIMATE – Used to estimate the time and space required for export dump file in addition to doing export

ESTIMATE_ONLY – Used to only estimate the time and space requirement for the export job. This parameter will not export and data into the dump file.

NETWORK_LINK – Used to get the export dump from some other database into the current database server. This parameter expects a database network link. Example

expdp hr/hr DIRECTORY=test_dir NETWORK_LINK=finance@prod1 …

Here finance is the remote database on prod1 server from which you are exporting the data.

ATTACH – Used to connect to the currently running job.

JOBNAME – Used to provide custom name to the export jobs.

STATUS – Used to display the status of the job. This parameter takes the numeric values in seconds.

PARALLEL – Used to provide number of parallel workers who will perform the actual job

Data Pump Import Parameters

Some of the parameters in import job are same in export like PARFILE, DIRECTORY, DUMPFILE, LOGFILE, NOLOGFILE etc

Some of the other parameters in import are

SQLFILE – Used to extract DDL from the export dump file. If you specify the SQLFILE parameters it extracts the DDLs, no import takes place.

REUSE_DATAFILES – This parameter tells data pump whether it should existing datafile for creating tablespace during import. If REUSE_DATAFILES is set to Y then import utility will write over your existing datafiles.

CONTENT – Using content parameter, you can filter what goes into the dump file. CONTENT parameter can take 3 values: ALL, DATA_ONLY, METADATA_ONLY

TABLE_EXISTS_ACTION – In case the table already exist in the database which a import job is trying to import from the export dump file, you can use this parameter to tell import job what to do. The valid values are SKIP, APPEND, TRUNCATE, REPLACE

TRANSPORTABLE_TABLESPACE – used when you want to transport the tablespace from one database to another

REMAP_SCHEMA – Used to move objects from one user to another. Replacement of old TOUSER parameter.

REMAP_DATAFILE – Used to remap the data file names specially when transferring the data from one platform to another. Example from windows to UNIX

REMAP_TABLESPACE – Sometimes you want the tablespace in which you are importing the data to be different from the source database tablespace. This parameter comes handy for doing that.

NETWORK_LINK – This parameters is used to import the data directory into the database from some other database without having to create a dump file.

TRANSFORM – In case you don’t want to import the objects storage attributes but just the content, you can use this parameter. TRANSFORM parameter instruct the data pump import job to modify the DDL that creates the object during import. You can modify 2 basic attributes SEGMENT_ATTRIBUTE and STORAGE.

FLASHBACK_TIME – Enables you to import data consistent as of flashback time you specify in the import job. Replacement of old CONSISTENT parameter in traditional import utility.

Monitoring Data Pump Job

You can monitor data pump job using following views

DBA_DATAPUMP_JOBS
OWNER_NAME	       VARCHAR2(30)		User that initiated the job
JOB_NAME	       VARCHAR2(30)		Name of the job
OPERATION	       VARCHAR2(30)		Type of operation being performed
JOB_MODE	       VARCHAR2(30)		Mode of operation being performed
STATE	               VARCHAR2(30)		Current job state
DEGREE	               NUMBER		        Number of worker processes performing the operation
ATTACHED_SESSIONS      NUMBER		        Number of sessions attached to the job
DBA_DATAPUMP_SESSIONS

OWNER_NAME	 VARCHAR2(30)	 	User that initiated the job
JOB_NAME	 VARCHAR2(30)	 	Name of the job
SADDR	         RAW(4)	 	        Address of the session attached to the job

You can also use V$SESSION_LONGOPS to monitor long running export or import jobs.

Hope this helps !!

6 thoughts on “Oracle Data Pump Technology – Oracle Database 10g”

Rahul says:

December 21, 2009 at 6:35 pm

Excellent article for newbees. Thanks for sharing.

Sikkandar says:

May 19, 2010 at 6:22 pm

Wow! its very good guide to understand the Data Pump Technology and useful for entry level DBAs. Thanks and Continue your writing…

sunny says:

June 14, 2010 at 6:39 am

very good article..good flow of explaining…thank u for sharing..

Anil says:

August 15, 2012 at 12:45 pm

Hi. Good article. Can we use data pump for exporting and importing incremental data.( Table already exists in target database). ? Or is there any method for exporting and importing incremental data for HRMS tables in oracle apps

ajay kumar says:

July 3, 2014 at 3:24 pm

nice article,very very clear..

rst says:

July 21, 2015 at 6:51 am

Hi ;

Very nice presentation. Please keep it up.

I would like to ask on thing. You have mentioned following line.

MCP is a background process with a full name as _DMnn_ and this process can be seen when a import or export job starts. Please show where you have found it ?

I have checked , these commands are not showing MCP background process format.
ps -ef | grep crms
ps -ef | grep crms_dm0