Your data, Our life
Chinese Home
Join Us
GTi Services
Client Login
Disaster Recovery Grid Computing Performance Management Operation Outsourcing
Oracle Solutions Quest Solutions NetApp Solutions Veritas Solutions EMC Solutions IBM Solutions
NetApp Solutions
 
Cilck here We'll contact you.
800-620-0232
 
GTi Services
Disaster Recovery
  Oracle Solutions
  Quest Solutions
 
  Veritas Solutions
  EMC Solutions
  IBM Solutions
Grid Computing
Performance Management
Operation Outsourcing
   
 
Oracle9 for UNIX: Providing Disaster Recovery with NetApp SnapMirror Technology
TR3057 by Jerry Liu, Jeff Browning, O.C.P. M.C.S.E. and Tim Moore, Network Appliance, Inc.





1. Purpose and Scope

This document describes techniques for setting up a disaster recovery configuration for Oracle8 for UNIX® using Network Appliance™ SnapMirror technology. Specifically, we cover the following issues:

  • Description of the NetApp SnapMirror technology and its approach to disaster recovery.
  • The infrastructure required to support SnapMirror technology.
  • How to set up a SnapMirror for use in an Oracle8™ environment.
  • How disaster recovery works in practice.
  • Resynching the mirror following recovery from a disaster.
2. Description of NetApp SnapMirror Technology

NetApp SnapMirror technology provides asynchronous mirroring of data between filer volumes. Data on the source volume is periodically replicated to the target at a user definable time interval, with the range being from one minute to one month. At the end of each replication event, the mirror target volume becomes an exact block for block copy of the mirror source volume. At that point, the two volumes share identical data content and characteristics. The mirror is initialized by effectively copying the entire source volume to the target volume. Once this initial copy is complete, replication events thereafter copy only changed blocks from the source volume to the target volume. This provides a highly efficient data replication mechanism.

Architecturally SnapMirror software is a logical extension of the NetApp WAFL™ file system, particularly the Snapshot feature. Using Snapshots, you can create a read-only copy of an entire filer volume. This copy is made by essentially saving only changed blocks after a particular point in time. Two sequential Snapshots can then be compared and the differences identified. Since this comparison takes place at the block level, only the changed blocks need be sent to the mirror target. By implementing the update transfers asynchronously, data latency issues inherent with remote synchronous mirroring techniques are eliminated. The elegance of these two design features becomes particularly apparent when running mirror pairs over WAN topologies.

For more information on how Snapshots work, see: File System Design for an NFS File Server Appliance by Dave Hitz, James Lau, and Michael Malcolm. Those with NetApp On the Web, or "NOW", password access can also check the following documentation, which covers Snapshots in detail: http://now.netapp.com/knowledge/docs/ontap/rel53/html/sag/snap.htm. (Netapp On the Web is NetApp's online support site. This site is open to all current Network Appliance customers.)

3. Assumptions and Requirements

We assume that you are familiar with Oracle8 and the operation of NetApp filers. We also assume that you are familiar with the operation of your particular version of UNIX. All examples in this technical report are from Oracle8 Enterprise Edition version 8.0.5.0.0 running under Sun Solaris™ 7 operating system. The techniques contained in this paper may require significant modifications to run within your environment.

The examples in this technical report assume the following:

  • The name of the source filer is "lau."
  • The name of the target filer is "hitz."
  • The name of the administrative user account within Oracle is "internal" and the password of this user is "oracle."
  • All data is stored on a volume called "oracle" on lau.
  • All data is mirrored on a volume called "oracle_mirror" on hitz.
4. Infrastructure

4.1. Oracle8 Server Machine

You need Oracle8 running on UNIX. We used Oracle8 Enterprise Edition version 8.0.5.0.0, and Sun Solaris 7. In your installation, be sure that your system satisfies the requirements for running Oracle8. For more information on this issue, check the Oracle8 Installation Manual for your target platform.

4.2. Source Filer

Any NetApp filer running Data ONTAP™ software version 5.3 or higher will work. The NFS and SnapMirror licenses on the filer must be activated. You must have a volume to store the Oracle data which is sized adequately for the database.

4.3. Target Filer

The target filer must be running the same Data ONTAP version as the source filer. Further, the NFS and SnapMirror software licenses on the target filer must also be activated. You must have a volume on the target filer which is equal to or greater in size than the one on the source filer, since SnapMirror software operates on a per volume basis.

4.4. Network

You need a network connection between the Oracle8 server machine and the filer. We have used 100BaseT, FDDI and Gigabit Ethernet, all of which work fine. A faster network will improve performance, of course.

You also need a network connection between the source filer and the target filer of sufficient bandwidth to accommodate the anticipated data change rate and SnapMirror software overhead. The choice of the network connection type should be based upon the following parameters:

  • Data transmission costs between the source and target filers.
  • The source volume size.
  • The data change rate.
  • The SnapMirror software update schedule.

In addition to propagating the changed blocks, each replication event requires an active map. For the initialization transfer (level 0), the size of the active map is 0.003% of the source volume size. Only the changed active map is transferred for each subsequent mirror iteration. For example, a 100GB source volume will require an initial transfer of a 3MB active map for the mirror initialization (level 0) transfer as well as all the data blocks of the source volume. Subsequently, only the changed active map that will be between a minimum of 0 MB and a maximum of 3 MB will be transferred for each subsequent mirror iteration, plus all changed data blocks since the last transfer.

The following shows the network configuration we used to test this solution:

Network Diagram
5. Setting up a SnapMirror for Oracle Datafile Storage

Mirrors are as simple to setup and manage as Filer volumes. Basic requirements are:

  • Identify the source and target volumes.
  • Estimate data change rates.
  • Create a pair of configuration files.
  • Start the mirror.

While not complex or sequence dependent, there are a few details.

The source and target volumes do not necessarily have to be of the same size nor share the same disk geometry. The target volume can be bigger and have different geometry (ex. 4x9 GB drives versus 2x18 GB drives); however, any geometry mismatch will result in a significant performance penalty.

After target volume selection, there are three files on the target filer and two on the source filer to configure. The first is /etc/snapmirror.allow, which is a list of filer names which will be either a mirror source or target. A common practice is to duplicate a single file with all mirror participants to all filers. The second is /etc/snapmirror.conf, which is required only on the target filer. It contains the following:

  • The source filer and volume names.
  • The target filer and volume names.
  • The maximum network bandwidth usage throttle.
  • A list of incremental update times in minutes, hours, days and months.

The third configuration file is not strictly required, but it makes SnapMirror administration simpler. The change is to add the mirror initiation command "vol snapmirror on" to /etc/rc on both filers. Note: this command can be placed anywhere after the network interfaces are defined.

The following listing shows the configuration files used in the test which we performed for this technical report:

mktg3@mktg3% cat /hitz/etc/snapmirror.conf
lau:oracle hitz:oracle_mirror kbs=5000 * * * *

mktg3@mktg3% cat /lau/etc/snapmirror.allow
hitz
lau

mktg3@mktg3% cat /lau/etc/rc
#Regenerated by registry Wed Apr 21 10:10:28 PDT 1999
#Auto-generated by setup Mon Apr 19 18:22:22 GMT 1999
hostname lau
ifconfig e0 `hostname`-e0 mediatype auto netmask 255.255.252.0
route add default 10.153.4.1 1
routed on
options dns.domainname 2700-1.netapp.com
options dns.enable on
...
vol snapmirror on

Interpreting this listing, the two filers involved in the mirror are hitz and lau. lau is the source filer and hitz is the target filer. Both filers have identical /etc/snapmirror.allow files, which is common practice. Only one /etc/snapmirror.conf file is required, in this case on hitz, which is the target. This file sets up the following SnapMirror software parameters:

  • The volume called oracle on lau will be replicated onto the volume called oracle_mirror on hitz.
  • A 5000 KB per second maximum throttle is set on the mirror.
  • The SnapMirror interval is set to occur once per minute, every hour, every day of the month and every day of the week.
  • The only thing to notice in the /etc/rc file is the addition of the "vol snapmirror on" command.

Once these configuration files are in place, the target volume is placed offline and the "vol snapmirror on" command is issued on both filers. At that point, a level zero replication event occurs. Once the level zero replication event is complete, the mirror is fully configured and running.

  • The target volume will reflect a status of "online snapmirrored" when queried with the "vol status" command.
  • The target volume is now online in read-only mode.
  • The source volume stays online and available to DBMS activities throughout the level zero operation.
6. SnapMirror Software Operation

SnapMirror has two distinct phases: initialization and incremental update. The initialization phase consists of a level 0 replication event in which a Snapshot is created on the source volume and then sent in its entirety to the target volume. For example, a 250 GB mirror initialization takes approximately 7.5 hours to complete across a 100BaseT full duplex link, or 1.5 hours across a Gigabit link. The level 0 event serves to initialize or "seed" the mirror volume since it contains every block in the source volume as of the point in time when the Snapshot is created.

After mirror initialization is complete, the filer examines /etc/snapmirror.conf every minute to see if there are any scheduled updates. This allows for modification of the mirror's configuration without disrupting the mirror. When an incremental update schedule time is due, a new Snapshot is taken and compared to the previous Snapshot. The delta blocks and the block map file are sent to the mirror target. In contrast to the level 0 initialization, the data mirrored is typically much smaller. Note that at all times the mirror target file system is in a consistent state.

Several special cases should be noted.

  • If an initialization (level 0) replication event is interrupted for more than 9 minutes (ex. a network outage, filer reboot) it will abort. Partial level 0 events are not recoverable, so the process must be restarted.
  • The KB per second maximum throttle parameter in /etc/snapmirror.conf can be changed at any time, and the edited values will take effect within two minutes. The exception to this is mirror initialization where a bandwidth throttle is already in effect. In this case, the bandwidth throttle cannot be modified until the initialization phase is complete or the process is interrupted.
  • Incremental updates will not start until the level 0 initialization is complete.
  • Any incremental updates which are missed are simply skipped. Subsequent incremental updates transmit any data which was skipped, so no data is lost.
  • Incremental updates in progress will run to completion according to the configuration settings and available network bandwidth. If a new incremental update is scheduled to start while an existing update is in progress, it is considered a schedule miss and skipped.
  • The SnapMirror process can be stopped and restarted at any time by issuing the following command sequence on either filer:
    vol snapmirror off
    [arbitrary time interval]
    vol snapmirror on
    As long as the target volume remains read only during the time between turning the SnapMirror process off and on, the mirror remains intact.
  • To break the mirror, you issue the following command on the target filer:
    vol options oracle_mirror snapmirrored off
    As a result the target volume is still online but changed from read-only to read/write mode.
7. Disaster Recovery - Breaking the Mirror

We performed the following test to demonstrate the use of SnapMirror technology with Oracle8:

  1. We created a SnapMirror relationship between two filers. See Section 2 for a description of the configuration.
  2. We started up an Oracle8 database with datafiles mounted on the source filer. All datafiles, control files and log files were stored on the filer.
  3. We ran the following script on this database from SQL*Plus:
    DROP TABLE hammer_tab;

    CREATE TABLE hammer_tab
        (num NUMBER,
        timestamp_col DATE);

    DROP SEQUENCE hammer_seq;

    CREATE SEQUENCE hammer_seq;
  4. We then created the following stored procedure from within ProcedureBuilder:
    PROCEDURE bighammer IS
        vnum NUMBER;
        vtimestamp DATE;
    BEGIN
        LOOP
            SELECT
                hammer_seq.nextval,
                sysdate
            INTO
                vnum,
                vtimestamp
            FROM
                dual;
            INSERT INTO hammer_tab
                (num,
                timestamp_col)
            VALUES
                (vnum,
                vtimestamp);
            TEXT_IO.PUT_LINE(TO_CHAR(vnum) || '-' ||
                TO_CHAR(vtimestamp,'SSSSS'));
            COMMIT;
        END LOOP;
    END;
  5. We then executed this PL/SQL procedure from within ProcedureBuilder. While this procedure was running, we allowed several SnapMirror events to occur. Then we failed the source filer by shutting it down. We also aborted the instance on the Oracle8 server. This simulated a disaster at the source location which resulted in a complete loss of service of both the filer and the database server. This resulted in the following activity from within ProcedureBuilder (note that the last row inserted has a key value of 1613):
    1-49706
    2-49706
    3-49707
    4-49707
    5-49707
    6-49707
    7-49707
    ... Many more like this ...
    1607-49969
    1608-49969
    1609-49969
    1610-49969
    1611-49969
    1612-49969
    1613-49969
    ORA-06510: PL/SQL: unhandled user-defined exception
    ORA-06512: at "SYS.STANDARD", line 629
    ORA-06512: at "SYS.STANDARD", line 646
    ORA-06512: at "BIGHAMMER", line 21
    ORA-06512: at "PU_005", line 1
    PDE-PXC002 Program unit execution aborted due to unhandled exception (6510).
    PL/SQL>
  6. After this, we broke the mirror by issuing the command on hitz:
    vol options oracle_mirror snapmirrored off
    This command forces the oracle_mirror volume to change status from read-only to read/write mode.

  7. We then brought up an Oracle8 instance on another machine which was mounted on the now-active file system on hitz, the target filer. This database accessed the datafiles which were created by the SnapMirror process on hitz. We performed a recovery on this database. The following log shows this activity:
    $ svrmgrl

    Oracle Server Manager Release 3.0.5.0.0 - Production

    (c) Copyright 1997, Oracle Corporation.  All Rights Reserved.

    Oracle8 Enterprise Edition Release 8.0.5.0.0 - Production
    PL/SQL Release 8.0.5.0.0 - Production

    SVRMGR> connect internal/oracle
    Connected.
    SVRMGR> startup mount
    ORACLE instance started.
    Total System Global Area                          5193232 bytes
    Fixed Size                                          48656 bytes
    Variable Size                                     4243456 bytes
    Database Buffers                                   819200 bytes
    Redo Buffers                                        81920 bytes
    Database mounted.
    SVRMGR> recover database
    Media recovery complete.
    SVRMGR> alter database open;
    Statement processed.
    SVRMGR> connect test/test
    Connected.
    SVRMGR> select to_char(max(num)) || '-' ||
         2> to_char(max(timestamp_col), 'SSSSS')
         3> from hammer_tab;
    TO_CHAR(MAX(NUM))||'-'||TO_CHAR(MAX(TIMESTAMP_
    ----------------------------------------------
    1033-49886
    1 row selected.
  8. We then brought lau back up as well, restarted the source database and ran the same query. Here is the result of that operation:
    SVRMGR> select to_char(max(num)) || '-' ||
         2> to_char(max(timestamp_col), 'SSSSS')
         3> from hammer_tab;
    TO_CHAR(MAX(NUM))||'-'||TO_CHAR(MAX(TIMESTAMP_
    ----------------------------------------------
    1613-49969
    1 row selected.

Note that the key value and timestamp for the source database matches precisely to that which was reported to the client. This is consistently our experience: When a NetApp filer experiences a dirty shutdown, no transactions are lost, although a recovery may need to be applied. However, the target database did lose 580 rows over about an 83 second period. This is to be expected. The mirror interval was set to 60 seconds, and the mirror relies on consistency points. Thus the mirror can be as much as two minutes out-of-date with the interval set in this way. (Obviously, a longer interval would result in more lost transactions, but also lower overhead.) The bottom line is that the target database did recover successfully albeit with a loss of a few very recent transactions. For a disaster recovery solution, this is perfectly acceptable to the vast majority of customers.

8. Resynching the Mirror

Currently the ability to restart a broken mirror is not available, although this feature is planned for a future release. Presently, the method for reestablishing a mirror is to reinitialize the mirror. If the target volume has been brought online and then written to, this may need to be done twice.

  • Once to replicate the data back to the source volume. (In this case the mirror relationship between the two filers is flipped.)
  • Once to reinitialize with the mirror relationship back to normal.

An alternative is to back up the target volume's data to tape, and then restore that data onto the source filer. This avoids the necessity to reinitialize the mirror twice; however, you currently cannot 'seed' a mirror from tape. This is planned for a future release.

9. Conclusions

The NetApp SnapMirror technology provides compelling advantages for the Oracle DBA seeking to support disaster recovery of a mission-critical Oracle database. Specifically:

  • The network traffic generated by the mirror process can be throttled to allow support for WAN connections.
  • The mirror interval is user-configurable and can be changed on the fly.
  • In the event of a disaster on the source side of the mirror, the target side can be easily brought online, and the Oracle database can return to normal use after a brief recovery process.
  • The near realtime nature of SnapMirror software means that the source filer will still operate normally when the network link to the target filer is broken. This is not true of synchronous mirroring, where the mirror link is essential to continued operation of the source machine.

Using SnapMirror technology, the DBA can assure the customer that the only transactions that may be lost (in case of server or filer failures) are those that occur between the mirror events. Further, bringing the database back to a consistent state is transparently handled by Oracle's normal recovery process.

10. Caveats

The use of network-attached storage is supported by Oracle only in the context of Network Appliance™ filers. For more information regarding use of Network Appliance filers with Oracle, see: TR3023 Using ORACLE with a Multiprotocol Filer by Bruce Clarke. You can also find support on Oracle's Web site for NetApp filers. However, Network Appliance has not tested this configuration with any version of UNIX other than Sun Solaris, and has certainly not tested with all of the combinations of hardware and software options available on Solaris. There may be significant differences in your configuration which will alter the procedures necessary to accomplish the objectives outlined in this paper. If you find that any of these procedures do not work in your environment, please contact the authors immediately.

 
     
webmaster Copyright ©2005 Generation Technology Inc.
Ver 1.5