NetApp vFiler DR mit Data ONTAP Simulator Teil 10: Disaster Failover

Dies ist Teil einer Artikelserie.

Im Prinzip läuft der Disaster Failover ähnlich wie der geplante Failover aus Teil 9 ab. Da der Master überraschend ausfällt, kann allerdings der vFiler nicht gestoppt und auch die temporäre Replikation vom Slave zum Master nicht bereits bei der Aktivierung des Slave eingerichtet werden. Beim Einschalten des Masters muss deshalb sicher gestellt werden, dass der vFiler nicht automatisch startet und somit auf Slave und Master gleichzeitig läuft. Sobald der Master wieder verfügbar ist muss außerdem die Replikation vom Slave zum Master angelegt und abgeschlossen werden, bevor der alte Master wieder den vFiler übernimmt. Im Vergleich zum geplanten Failover wandern Schritte 1, 4 und 5 vom Failover also in den Rollback.

Disaster Failover von Master zu Slave:

  1. vFiler DR auf Slave starten => “running”
netapp02> vfiler dr activate vfiler01@netapp01
CIFS local server is running.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Mon May  9 22:01:06 CEST [netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Mon May  9 22:01:06 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:01:06 CEST [netapp02:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.67 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:61:01:2b (auto-1000t-fd-up) flowcontrol full
netapp02> Mon May  9 22:01:07 CEST [netapp02:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp01' was activated.
Mon May  9 22:01:11 CEST [vfiler01@netapp02:export.host.resolve.timeout:warning]: Trial 1 for the nameservice lookup request timed out.
Mon May  9 22:01:30 CEST [vfiler01@netapp02:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.
netapp02> vfiler status
vfiler0                          running
vfiler01                         running
  1. State des SnapMirrors prüfen => “Broken-off” auf Slave
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:02:30   Idle

Wie bereits erwähnt sollte der vFiler auf dem Master (netapp01) nicht gestartet werden, solange er noch auf dem Slave (netapp02) läuft. Um dies zu verhindern kann z.B. das automatische Booten von Data ONTAP auf dem Master deaktiviert werden. Wenn Data ONTAP nicht automatisch bootet, wird auch der vFiler nicht gestartet. Data ONTAP muss dann manuell gebootet (Befehl: boot_ontap) und sofort danach der vFiler gestoppt werden (vfiler stop vfiler). Alternativ kann auch vor dem Booten von Data ONTAP die IP Konfiguration des vFilers deaktiviert werden.

Rollback von Slave zu Master

  1. Resync von Slave auf Master (-s für synchrone Replizierung)
netapp01> vfiler dr resync -s vfiler01@netapp02
One can optionally provide an alternate ip
 path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp02 []:
Alternate IP address/Hostname for local filer netapp01 []:
netapp02's Administrative login: root
netapp02's Administrative password:

CIFS local server on vFiler vfiler01 is shutting down...

waiting for CIFS shut down (^C aborts)...

CIFS local server on vfiler vfiler01 has shut down...
Mon May  9 22:06:03 CEST [vfiler01@netapp01:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp02.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Mon May  9 22:06:07 CEST [netapp01:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 is using netapp02(4082368507)_vol_vfiler01.27 as the base snapshot.
Mon May  9 22:06:07 CEST [netapp01:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Mon May  9 22:06:08 CEST [netapp01:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Mon May  9 22:06:09 CEST [netapp01:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Mon May  9 22:06:10 CEST [netapp01:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.
  1. SnapMirror von netapp02 (Source) auf netapp01 (Destination) prüfen => zusätzliche Einträge mit State “Snapmirrored” auf Master und “Source” auf Slave
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Snapmirrored   00:00:39   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:06:42   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:07:17   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:21   Idle
  1. Warten bis SnapMirror von netapp02 (Source) auf netapp01 (Destination) “In-sync”
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:08:10   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:08:27   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:00   In-sync
  1. vFiler auf Slave stoppen => “stopped”
netapp02> vfiler stop vfiler01
vfiler01                         stopped
Mon May  9 22:09:45 CEST [netapp02:vf.stopped:warning]: vfiler: 'vfiler01'; stopped

netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped
  1. vFiler auf Master starten => “running”
netapp01> vfiler dr activate vfiler01@netapp02
Waiting for "vol_vfiler" to become stable.
Mon May  9 22:10:51 CEST [netapp01:snapmirror.sync.fail:notice]: Synchronous SnapMirror from netapp02_vfiler01_con:vol_vfiler to netapp01:vol_vfiler01 failed.
Mon May  9 22:10:58 CEST [netapp01:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.
CIFS local server is running.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Mon May  9 22:10:58 CEST [netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Mon May  9 22:10:58 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.66 netmask 0xffffff00 broadcast 192.168.2.255
        inet 192.168.2.69 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:ee:ee:f2 (auto-1000t-fd-up) flowcontrol full
netapp01> Mon May  9 22:10:59 CEST [netapp01:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp02' was activated.
Mon May  9 22:11:22 CEST [vfiler01@netapp01:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
  1. Status des SnapMirrors von netapp02 (Source) auf netapp01 (Destination) prüfen => “Source” auf netapp02 und “Broken-off” auf netapp01
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:03:45   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:12:19   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:13:03   Idle
netapp02:vol_vfiler 01              netapp01:vol_vfiler01      Source         00:04:29   Idle
  1. Resync von Master auf Slave => Status SnapMirror von netapp01 (Source) auf netapp02 (Destination) “In-sync” (dauert eine Weile)
netapp02> vfiler dr resync -s vfiler01@netapp01
One can optionally provide an alternate ip
 path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp01 []:
Alternate IP address/Hostname for local filer netapp02 []:
netapp01's Administrative login: root
netapp01's Administrative password:

CIFS local server on vFiler vfiler01 is shutting down...

waiting for CIFS shut down (^C aborts)...

CIFS local server on vfiler vfiler01 has shut down...
Mon May  9 22:14:02 CEST [vfiler01@netapp02:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp01.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Mon May  9 22:14:06 CEST [netapp02:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 is using netapp01(4082368508)_vol_vfiler01.4 as the base snapshot.
Mon May  9 22:14:06 CEST [netapp02:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Mon May  9 22:14:07 CEST [netapp02:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Mon May  9 22:14:08 CEST [netapp02:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Mon May  9 22:14:08 CEST [netapp02:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.

netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:07:31   Idle
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:07:56   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:00:00   In-sync
  1. SnapMirror Beziehungen von Slave zu Master löschen
netapp02> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.
netapp01> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.

Wie vor dem Failover wird der vFiler jetzt wieder auf dem Master (netapp01) ausgeführt und die Daten zum Slave (netapp02) repliziert.

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
netapp01> snapmirror status
Snapmirror is on.
Source                     Destination                State          Lag        Status
netapp01:vol_vfiler01      netapp02:vol_vfiler01      Source         00:00:00   In-sync
netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped, DR backup
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync

Alle Artikel der Serie:
Teil 1: Download der benötigten Komponenten
Teil 2: Einrichtung des 1. Simulators
Teil 3: Einrichtung des 2. Simulators
Teil 4: Erstellen eines Aggregats und Volumes
Teil 5: DNS Konfiguration
Teil 6: vFiler erstellen und vFiler DR konfigurieren
Teil 7: Synchroner vFiler DR
Teil 8: Freigaben auf vFiler erstellen
Teil 9: Geplanter Failover
Teil 10: Disaster Failover