One of the parameter we configure in physical standby setup is about how much amount of time LGWR on primary should wait for physical standby to respond.
When changes happens on primary side, those redo changes are shipped on physical standby database. If physical standby database is down or if standby server is not reachable, we need to have some time limit on how much time primary should wait for standby to respond (and then move ahead without try to ship redo changes to standby). This limit is defined by Net Timeout parameter.
You can check definition in Oracle docs for the same – http://docs.oracle.com/cd/E11882_01/server.112/e17023/dbpropref.htm#i101032
"The NetTimeout configurable database property specifies the number of seconds the LGWR waits for Oracle Net Services to respond to a LGWR request. It is used to bypass the long connection timeout in TCP."
One of the issue I was seeing is my DG broker was giving following error
Protection Mode: MaxAvailability
orcl_b - Primary database
Error: ORA-16825: multiple errors or warnings, including fast-start failover-related errors or warnings, detected for the database
orcl_a - (*) Physical standby database
Warning: ORA-16817: unsynchronized fast-start failover configuration
(*) Fast-Start Failover target
When I checked database info in verbose mode, I saw following
DGMGRL> show database verbose orcl_a
Database - orcl_a
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 1 minute 1 second
Apply Lag: 3 minutes 7 seconds
Real Time Query: OFF
This means that even when my DB is in MaxAvailbility mode, I still see lag and standby is not getting in synch with primary.
My broker log file (drc<ORACLE_SID>.log in diagnostic_dest location) was showing following error
Redo transport problem detected: redo transport for database orcl_a has the following error:
ORA-16198: Timeout incurred on internal channel during remote archival
Data Guard Broker Status Summary:
Type Name Severity Status
Configuration FSF Warning ORA-16607
Primary Database orcl_b Error ORA-16825
Physical Standby Database orcl_a Warning ORA-16817
Oracle error ORA-16198 represent timeout issue that must be happening while contacting standby site.
When I sanity checked standby, everything was fine. So I checked NET Timeout parameter which define the timeout value when primary should be able to contact standby.
I realized that timeout value is very less on my system.
When you do show database verbose <unique name>, it shows you properties
NetTimeout = '4'
In my case it was set to 4, which is very low value.
As soon as I set this value to around 10, everything was back to normal.
There is no standard value for this parameter, but usual value should be between 10-30 depending on the network config you have. Basically primary should be able to contact standby within this timelimit and hear back from standby.
Downside for keeping this value higher is, in case if something goes wrong with your standby, your primary will hang for that much time.
So, in my case if I am setting a value of 10 sec for Net Timeout parameter and something goes wrong with standby, my primary database will keep trying to send redo entry to standby for 10 sec and till that time commit wont happen (if I am in MaxAvailability mode).
So we need to balance out the value of this parameter and make sure we set optimum value.
Hope this helps !!