Oracle9
for UNIX: Providing Disaster Recovery with NetApp SnapMirror
Technology
TR3057 by Jerry
Liu, Jeff Browning, O.C.P. M.C.S.E. and Tim Moore,
Network Appliance, Inc.
1. Purpose and Scope
This document describes techniques for setting up a disaster recovery
configuration for Oracle8 for UNIX® using Network Appliance™ SnapMirror
technology. Specifically, we cover the following issues:
- Description of the NetApp SnapMirror technology and its approach to
disaster recovery.
- The infrastructure required to support SnapMirror technology.
- How to set up a SnapMirror for use in an Oracle8™ environment.
- How disaster recovery works in practice.
- Resynching the mirror following recovery from a disaster.
2. Description of NetApp SnapMirror
Technology
NetApp SnapMirror technology provides asynchronous mirroring of data
between filer volumes. Data on the source volume is periodically
replicated to the target at a user definable time interval, with the range
being from one minute to one month. At the end of each replication event,
the mirror target volume becomes an exact block for block copy of the
mirror source volume. At that point, the two volumes share identical data
content and characteristics. The mirror is initialized by effectively
copying the entire source volume to the target volume. Once this initial
copy is complete, replication events thereafter copy only changed blocks
from the source volume to the target volume. This provides a highly
efficient data replication mechanism.
Architecturally SnapMirror software is a logical extension of the
NetApp WAFL™ file system, particularly the Snapshot feature. Using
Snapshots, you can create a read-only copy of an entire filer volume. This
copy is made by essentially saving only changed blocks after a particular
point in time. Two sequential Snapshots can then be compared and the
differences identified. Since this comparison takes place at the block
level, only the changed blocks need be sent to the mirror target. By
implementing the update transfers asynchronously, data latency issues
inherent with remote synchronous mirroring techniques are eliminated. The
elegance of these two design features becomes particularly apparent when
running mirror pairs over WAN topologies.
For more information on how Snapshots work, see: File System Design
for an NFS File Server Appliance by Dave Hitz, James Lau, and
Michael Malcolm. Those with NetApp
On the Web, or "NOW", password access can also check the
following documentation, which covers Snapshots in detail: http://now.netapp.com/knowledge/docs/ontap/rel53/html/sag/snap.htm.
(Netapp On the Web is NetApp's online support site.
This site is open to all current Network Appliance customers.)
3. Assumptions and Requirements
We assume that you are familiar with Oracle8 and the operation of
NetApp filers. We also assume that you are familiar with the operation of
your particular version of UNIX. All examples in this technical report are
from Oracle8 Enterprise Edition version 8.0.5.0.0 running under Sun
Solaris™ 7 operating system. The techniques contained in this paper may
require significant modifications to run within your environment.
The examples in this technical report assume the following:
- The name of the source filer is "lau."
- The name of the target filer is "hitz."
- The name of the administrative user account within Oracle is
"internal" and the password of this user is "oracle."
- All data is stored on a volume called "oracle" on lau.
- All data is mirrored on a volume called "oracle_mirror" on hitz.
4. Infrastructure
4.1. Oracle8 Server Machine
You need Oracle8 running on UNIX. We used Oracle8 Enterprise Edition
version 8.0.5.0.0, and Sun Solaris 7. In your installation, be sure that
your system satisfies the requirements for running Oracle8. For more
information on this issue, check the Oracle8 Installation Manual for your
target platform.
4.2. Source Filer
Any NetApp filer running Data ONTAP™ software version 5.3 or higher
will work. The NFS and SnapMirror licenses on the
filer must be activated. You must have a volume to
store the Oracle data which is sized adequately for
the database.
4.3. Target Filer
The target filer must be running the same Data ONTAP version as the
source filer. Further, the NFS and SnapMirror software licenses on the
target filer must also be activated. You must have a volume on the target
filer which is equal to or greater in size than the one on the source
filer, since SnapMirror software operates on a per volume basis.
4.4. Network
You need a network connection between the Oracle8 server machine and
the filer. We have used 100BaseT, FDDI and Gigabit Ethernet, all of which
work fine. A faster network will improve performance, of course.
You also need a network connection between the source filer and the
target filer of sufficient bandwidth to accommodate the anticipated data
change rate and SnapMirror software overhead. The choice of the network
connection type should be based upon the following parameters:
- Data transmission costs between the source and target filers.
- The source volume size.
- The data change rate.
- The SnapMirror software update schedule.
In addition to propagating the changed blocks, each replication event
requires an active map. For the initialization transfer (level 0), the
size of the active map is 0.003% of the source volume size. Only the
changed active map is transferred for each subsequent mirror iteration.
For example, a 100GB source volume will require an initial transfer of a
3MB active map for the mirror initialization (level 0) transfer as well as
all the data blocks of the source volume. Subsequently, only the changed
active map that will be between a minimum of 0 MB and a maximum of 3 MB
will be transferred for each subsequent mirror iteration, plus all changed
data blocks since the last transfer.
The following shows the network configuration we used to test this
solution:
5. Setting up a SnapMirror for Oracle Datafile
Storage
Mirrors are as simple to setup and manage as Filer volumes. Basic
requirements are:
- Identify the source and target volumes.
- Estimate data change rates.
- Create a pair of configuration files.
- Start the mirror.
While not complex or sequence dependent, there are a few details.
The source and target volumes do not necessarily have to be of the same
size nor share the same disk geometry. The target volume can be bigger and
have different geometry (ex. 4x9 GB drives versus 2x18 GB drives);
however, any geometry mismatch will result in a significant performance
penalty.
After target volume selection, there are three files on the target
filer and two on the source filer to configure. The first is
/etc/snapmirror.allow, which is a list of filer names which will be either
a mirror source or target. A common practice is to duplicate a single file
with all mirror participants to all filers. The second is
/etc/snapmirror.conf, which is required only on the target filer. It
contains the following:
- The source filer and volume names.
- The target filer and volume names.
- The maximum network bandwidth usage throttle.
- A list of incremental update times in minutes, hours, days and
months.
The third configuration file is not strictly required, but it makes
SnapMirror administration simpler. The change is to add the mirror
initiation command "vol snapmirror on" to /etc/rc on both filers. Note:
this command can be placed anywhere after the network interfaces are
defined.
The following listing shows the configuration files used in the test
which we performed for this technical report:
mktg3@mktg3% cat
/hitz/etc/snapmirror.conf lau:oracle hitz:oracle_mirror kbs=5000 * *
* *
mktg3@mktg3% cat
/lau/etc/snapmirror.allow hitz lau
mktg3@mktg3% cat
/lau/etc/rc #Regenerated by registry Wed Apr 21 10:10:28 PDT
1999 #Auto-generated by setup Mon Apr 19 18:22:22 GMT
1999 hostname lau ifconfig e0 `hostname`-e0 mediatype auto netmask
255.255.252.0 route add default 10.153.4.1 1 routed on options
dns.domainname 2700-1.netapp.com options dns.enable on ... vol
snapmirror on
Interpreting this listing, the two filers involved in the mirror are
hitz and lau. lau is the source filer and hitz is the target filer. Both
filers have identical /etc/snapmirror.allow files, which is common
practice. Only one /etc/snapmirror.conf file is required, in this case on
hitz, which is the target. This file sets up the following SnapMirror
software parameters:
- The volume called oracle on lau will be replicated onto the volume
called oracle_mirror on hitz.
- A 5000 KB per second maximum throttle is set on the mirror.
- The SnapMirror interval is set to occur once per minute, every hour,
every day of the month and every day of the week.
- The only thing to notice in the /etc/rc file is the addition of the
"vol snapmirror on" command.
Once these configuration files are in place, the target volume is
placed offline and the "vol snapmirror on" command is issued on both
filers. At that point, a level zero replication event occurs. Once the
level zero replication event is complete, the mirror is fully configured
and running.
- The target volume will reflect a status of "online snapmirrored"
when queried with the "vol status" command.
- The target volume is now online in read-only mode.
- The source volume stays online and available to DBMS activities
throughout the level zero operation.
6. SnapMirror Software Operation
SnapMirror has two distinct phases: initialization and incremental
update. The initialization phase consists of a level 0 replication event
in which a Snapshot is created on the source volume and then sent in its
entirety to the target volume. For example, a 250 GB mirror initialization
takes approximately 7.5 hours to complete across a 100BaseT full duplex
link, or 1.5 hours across a Gigabit link. The level 0 event serves to
initialize or "seed" the mirror volume since it contains every block in
the source volume as of the point in time when the Snapshot is
created.
After mirror initialization is complete, the filer examines
/etc/snapmirror.conf every minute to see if there are any scheduled
updates. This allows for modification of the mirror's configuration
without disrupting the mirror. When an incremental update schedule time is
due, a new Snapshot is taken and compared to the previous Snapshot. The
delta blocks and the block map file are sent to the mirror target. In
contrast to the level 0 initialization, the data mirrored is typically
much smaller. Note that at all times the mirror target file system is in a
consistent state.
Several special cases should be noted.
- If an initialization (level 0) replication event is interrupted for
more than 9 minutes (ex. a network outage, filer reboot) it will abort.
Partial level 0 events are not recoverable, so the process must be
restarted.
- The KB per second maximum throttle parameter in /etc/snapmirror.conf
can be changed at any time, and the edited values will take effect
within two minutes. The exception to this is mirror initialization where
a bandwidth throttle is already in effect. In this case, the bandwidth
throttle cannot be modified until the initialization phase is complete
or the process is interrupted.
- Incremental updates will not start until the level 0 initialization
is complete.
- Any incremental updates which are missed are simply skipped.
Subsequent incremental updates transmit any data which was skipped, so
no data is lost.
- Incremental updates in progress will run to completion according to
the configuration settings and available network bandwidth. If a new
incremental update is scheduled to start while an existing update is in
progress, it is considered a schedule miss and skipped.
- The SnapMirror process can be stopped and restarted at any time by
issuing the following command sequence on either filer:
vol snapmirror off [arbitrary time
interval] vol snapmirror on As long as the target
volume remains read only during the time between turning the SnapMirror
process off and on, the mirror remains intact.
- To break the mirror, you issue the following command on the target
filer:
vol options oracle_mirror snapmirrored off
As a result the target volume is still online but changed
from read-only to read/write mode.
7. Disaster Recovery - Breaking the Mirror
We performed the following test to demonstrate the use of SnapMirror
technology with Oracle8:
- We created a SnapMirror relationship between two filers. See Section 2 for
a description of the configuration.
- We started up an Oracle8 database with datafiles mounted on the
source filer. All datafiles, control files and log files were stored on
the filer.
- We ran the following script on this database from SQL*Plus:
DROP TABLE hammer_tab;
CREATE TABLE
hammer_tab (num
NUMBER, timestamp_col DATE);
DROP
SEQUENCE hammer_seq;
CREATE SEQUENCE hammer_seq;
- We then created the following stored procedure from within
ProcedureBuilder:
PROCEDURE bighammer
IS vnum
NUMBER; vtimestamp
DATE; BEGIN LOOP SELECT hammer_seq.nextval, sysdate INTO vnum, vtimestamp FROM dual; INSERT
INTO
hammer_tab (num, timestamp_col) VALUES (vnum, vtimestamp); TEXT_IO.PUT_LINE(TO_CHAR(vnum)
|| '-'
|| TO_CHAR(vtimestamp,'SSSSS')); COMMIT; END
LOOP; END;
- We then executed this PL/SQL procedure from within ProcedureBuilder.
While this procedure was running, we allowed several SnapMirror events
to occur. Then we failed the source filer by shutting it down. We also
aborted the instance on the Oracle8 server. This simulated a disaster at
the source location which resulted in a complete loss of service of both
the filer and the database server. This resulted in the following
activity from within ProcedureBuilder (note that the last row inserted
has a key value of 1613):
1-49706 2-49706 3-49707 4-49707 5-49707 6-49707 7-49707 ...
Many more like this
... 1607-49969 1608-49969 1609-49969 1610-49969 1611-49969 1612-49969 1613-49969 ORA-06510:
PL/SQL: unhandled user-defined exception ORA-06512: at
"SYS.STANDARD", line 629 ORA-06512: at "SYS.STANDARD", line
646 ORA-06512: at "BIGHAMMER", line 21 ORA-06512: at "PU_005",
line 1 PDE-PXC002 Program unit execution aborted due to unhandled
exception (6510). PL/SQL>
- After this, we broke the mirror by issuing the command on hitz:
vol options oracle_mirror snapmirrored off
This command forces the oracle_mirror volume
to change status from read-only to read/write mode.
- We then brought up an Oracle8 instance on another machine which was
mounted on the now-active file system on hitz, the target filer. This
database accessed the datafiles which were created by the SnapMirror
process on hitz. We performed a recovery on this database. The following
log shows this activity:
$ svrmgrl
Oracle Server Manager Release
3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.
All Rights Reserved.
Oracle8 Enterprise Edition Release
8.0.5.0.0 - Production PL/SQL Release 8.0.5.0.0 -
Production
SVRMGR> connect
internal/oracle Connected. SVRMGR> startup mount ORACLE
instance started. Total System Global
Area 5193232
bytes Fixed
Size 48656
bytes Variable
Size 4243456
bytes Database
Buffers 819200
bytes Redo
Buffers 81920
bytes Database mounted. SVRMGR> recover database Media
recovery complete. SVRMGR> alter database open; Statement
processed. SVRMGR> connect test/test Connected. SVRMGR>
select to_char(max(num)) || '-'
|| 2> to_char(max(timestamp_col),
'SSSSS') 3> from
hammer_tab; TO_CHAR(MAX(NUM))||'-'||TO_CHAR(MAX(TIMESTAMP_ ---------------------------------------------- 1033-49886 1
row selected.
- We then brought lau back up as well, restarted the source database
and ran the same query. Here is the result of that operation:
SVRMGR> select to_char(max(num)) || '-'
|| 2> to_char(max(timestamp_col),
'SSSSS') 3> from
hammer_tab; TO_CHAR(MAX(NUM))||'-'||TO_CHAR(MAX(TIMESTAMP_ ---------------------------------------------- 1613-49969 1
row selected.
Note that the key value and timestamp for the source database matches
precisely to that which was reported to the client. This is consistently
our experience: When a NetApp filer experiences a dirty shutdown, no
transactions are lost, although a recovery may need to be applied.
However, the target database did lose 580 rows over about an 83 second
period. This is to be expected. The mirror interval was set to 60 seconds,
and the mirror relies on consistency points. Thus the mirror can be as
much as two minutes out-of-date with the interval set in this way.
(Obviously, a longer interval would result in more lost transactions, but
also lower overhead.) The bottom line is that the target database did
recover successfully albeit with a loss of a few very recent transactions.
For a disaster recovery solution, this is perfectly acceptable to the vast
majority of customers.
8. Resynching the Mirror
Currently the ability to restart a broken mirror is not available,
although this feature is planned for a future release. Presently, the
method for reestablishing a mirror is to reinitialize the mirror. If the
target volume has been brought online and then written to, this may need
to be done twice.
- Once to replicate the data back to the source volume. (In this case
the mirror relationship between the two filers is flipped.)
- Once to reinitialize with the mirror relationship back to normal.
An alternative is to back up the target volume's data to tape, and then
restore that data onto the source filer. This avoids the necessity to
reinitialize the mirror twice; however, you currently cannot 'seed' a
mirror from tape. This is planned for a future release.
9. Conclusions
The NetApp SnapMirror technology provides compelling advantages for the
Oracle DBA seeking to support disaster recovery of a mission-critical
Oracle database. Specifically:
- The network traffic generated by the mirror process can be throttled
to allow support for WAN connections.
- The mirror interval is user-configurable and can be changed on the
fly.
- In the event of a disaster on the source side of the mirror, the
target side can be easily brought online, and the Oracle database can
return to normal use after a brief recovery process.
- The near realtime nature of SnapMirror software means that the
source filer will still operate normally when the network link to the
target filer is broken. This is not true of synchronous mirroring, where
the mirror link is essential to continued operation of the source
machine.
Using SnapMirror technology, the DBA can assure the customer that the
only transactions that may be lost (in case of server or filer failures)
are those that occur between the mirror events. Further, bringing the
database back to a consistent state is transparently handled by Oracle's
normal recovery process.
10. Caveats
The use of network-attached storage is supported by Oracle only in the
context of Network Appliance™ filers. For more information regarding use
of Network Appliance filers with Oracle, see: TR3023 Using ORACLE with
a Multiprotocol Filer by Bruce Clarke. You can also find support on Oracle's Web site for NetApp filers.
However, Network Appliance has not tested this configuration with any
version of UNIX other than Sun Solaris, and has certainly not tested with
all of the combinations of hardware and software options available on
Solaris. There may be significant differences in your configuration which
will alter the procedures necessary to accomplish the objectives outlined
in this paper. If you find that any of these procedures do not work in
your environment, please contact the
authors immediately. |