NetApp vFiler DR with Data ONTAP Simulator Part 10: Disaster Failover

This article is part of a series.

Disaster Failover is very similar to a planned failover described in part 9 of this series. Because the master is not available unexpectedly the vFiler cannot be stopped and the temporary replication from slave to master has to be set up later before the rollback. When the master is powered on again you have to take care that the vFiler is not started automatically. Otherwise the vFiler is running on master and slave simultaneously. In comparison to the planned failover steps 1, 4 and 5 are moved from failover to rollback.

Disaster Failover from master to slave:

  1. start vFiler DR on slave => “running”
netapp02> vfiler dr activate vfiler01@netapp01
CIFS local server is running.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Mon May  9 22:01:06 CEST [netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:01:06 CEST [netapp02:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.67 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:61:01:2b (auto-1000t-fd-up) flowcontrol full
netapp02> Mon May  9 22:01:07 CEST [netapp02:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp01' was activated.
Mon May  9 22:01:11 CEST [vfiler01@netapp02:export.host.resolve.timeout:warning]: Trial 1 for the nameservice lookup request timed out.
Mon May  9 22:01:30 CEST [vfiler01@netapp02:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.
netapp02> vfiler status
vfiler0                          running
vfiler01                         running
  1. check state of SnapMirror => “Broken-off” on slave
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:02:30   Idle

As mentioned above you have to make sure the vFiler is not started on the master (netapp01) as long it is already running on the slave (netapp02). You can e.g. ensure this by disabling automatic boot of Data ONTAP. If Data ONTAP does not boot the vFiler won’t start either. After a failure you have to boot Data ONTAP manually (command: boot_ontap) and then stop the vFiler immediately (vfiler stop vfiler). Alternatively you can disable IP configuration of the vFiler before booting Data ONTAP.

Rollback from slave to master

  1. resync from Slave to Master (-s for synchronous replication)
netapp01> vfiler dr resync -s vfiler01@netapp02
One can optionally provide an alternate ip
 path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp02 []:
Alternate IP address/Hostname for local filer netapp01 []:
netapp02's Administrative login: root
netapp02's Administrative password:

CIFS local server on vFiler vfiler01 is shutting down...

waiting for CIFS shut down (^C aborts)...

CIFS local server on vfiler vfiler01 has shut down...
Mon May  9 22:06:03 CEST [vfiler01@netapp01:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp02.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Mon May  9 22:06:07 CEST [netapp01:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 is using netapp02(4082368507)_vol_vfiler01.27 as the base snapshot.
Mon May  9 22:06:07 CEST [netapp01:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Mon May  9 22:06:08 CEST [netapp01:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Mon May  9 22:06:09 CEST [netapp01:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Mon May  9 22:06:10 CEST [netapp01:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.
  1. check SnapMirror from netapp02 (Source) to netapp01 (Destination) => additional entries with state “Snapmirrored” on master and “Source” on slave
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Snapmirrored   00:00:39   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:06:42   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:07:17   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:21   Idle
  1. wait until status of SnapMirror from netapp02 (Source) to netapp01 (Destination) “In-sync”
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:08:10   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:08:27   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:00   In-sync
  1. stop vFiler on slave => “stopped”
netapp02> vfiler stop vfiler01
vfiler01                         stopped
Mon May  9 22:09:45 CEST [netapp02:vf.stopped:warning]: vfiler: 'vfiler01'; stopped

netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped
  1. start vFiler on master => “running”
netapp01> vfiler dr activate vfiler01@netapp02
Waiting for "vol_vfiler" to become stable.
Mon May  9 22:10:51 CEST [netapp01:snapmirror.sync.fail:notice]: Synchronous SnapMirror from netapp02_vfiler01_con:vol_vfiler to netapp01:vol_vfiler01 failed.
Mon May  9 22:10:58 CEST [netapp01:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.
CIFS local server is running.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Mon May  9 22:10:58 CEST [netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.66 netmask 0xffffff00 broadcast 192.168.2.255
        inet 192.168.2.69 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:ee:ee:f2 (auto-1000t-fd-up) flowcontrol full
netapp01> Mon May  9 22:10:59 CEST [netapp01:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp02' was activated.
Mon May  9 22:11:22 CEST [vfiler01@netapp01:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
  1. check state of SnapMirror from netapp02 (Source) to netapp01 (Destination) => “Source” on netapp02 and “Broken-off” on netapp01
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:03:45   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:12:19   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:13:03   Idle
netapp02:vol_vfiler 01              netapp01:vol_vfiler01      Source         00:04:29   Idle
  1. resync of master to slave => status of SnapMirror from netapp01 (Source) to netapp02 (Destination) “In-sync” (can take some time)
netapp02> vfiler dr resync -s vfiler01@netapp01
One can optionally provide an alternate ip
 path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp01 []:
Alternate IP address/Hostname for local filer netapp02 []:
netapp01's Administrative login: root
netapp01's Administrative password:

CIFS local server on vFiler vfiler01 is shutting down...

waiting for CIFS shut down (^C aborts)...

CIFS local server on vfiler vfiler01 has shut down...
Mon May  9 22:14:02 CEST [vfiler01@netapp02:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp01.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Mon May  9 22:14:06 CEST [netapp02:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 is using netapp01(4082368508)_vol_vfiler01.4 as the base snapshot.
Mon May  9 22:14:06 CEST [netapp02:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Mon May  9 22:14:07 CEST [netapp02:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Mon May  9 22:14:08 CEST [netapp02:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Mon May  9 22:14:08 CEST [netapp02:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.

netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:07:31   Idle
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:07:56   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:00:00   In-sync
  1. delete SnapMirror relations of slave to master
netapp02> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.
netapp01> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.

As before the failover the vFiler runs on the master (netapp01) again and the data is replicated from the slave (netapp02) to the master (netapp01).

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
netapp01> snapmirror status
Snapmirror is on.
Source                     Destination                State          Lag        Status
netapp01:vol_vfiler01      netapp02:vol_vfiler01      Source         00:00:00   In-sync
netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped, DR backup
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync

All articles of the series
Part 1: Download of the files needed
Part 2: Configuration of the first simulator
Part 3: Configuration of the second simulator
Part 4: Create an aggregate and volume
Part 5: DNS Configuration
Part 6: Create vFiler and configure vFiler DR
Part 7: Synchronous vFiler DR
Part 8: Create shares on vFiler
Part 9: Planned Failover
Part 10: Disaster Failover