The handling of shadow set recovery and repair depends on
the type of failure that occurred and the hardware configuration.
In general, devices that are inaccessible tend to fail over to other
controllers whenever possible. Otherwise, they are removed from
the shadow set. Errors that occur as a result of media defects can
often be repaired automatically by the volume shadowing software.
Table 1 Types of Failures
|
Type |
Description |
Controller
error
|
Results from a failure in
the controller. If the failure is recoverable, processing continues
and data availability is not affected. If the failure is nonrecoverable, shadow
set members connected to the controller are removed from the shadow set,
and processing continues with the remaining members. In configurations where
disks are dual-pathed between two controllers, and one controller
fails, the shadow set members fail over to the remaining controller
and processing continues.
|
Device error
|
Signifies that the mechanics
or electronics in the device failed. If the failure is recoverable,
processing continues. If the failure is nonrecoverable, the node that
detects the error removes the device from the shadow set.
|
Data errors
|
Results when a device detects
corrupt data. Data errors usually result from media defects that
do not cause the device to be removed from a shadow set. Depending
on the severity of the data error (or the degree of media deterioration),
the controller takes one of the following actions: - Corrects the error and returns valid data.

- Corrects the data and, depending on the device and
controller implementation, may revector it to a new logical block
number (LBN).

- Returns a parity error status
to Volume Shadowing, which means the data cannot be read without
error.
When data cannot
be corrected by the controller, volume shadowing will attempt to
replace the lost data by retrieving it from another shadow set member
and writing the data to the member with the error. This repair operation
is synchronized within the cluster and with the application I/O stream.
If the operation fails, then the member with the error is removed
from the shadow set.
|
Connectivity failures
|
When a connectivity failure occurs, the
first node to detect the failure must decide how to recover from
the failure in a manner least likely to affect the availability
or consistency of the data. As each node discovers the recoverable device
failure, it determines its course of action as follows:
|