Wednesday, November 9, 2016

Unable to offline RemoteGroup resource cleanly


When VCS tries to take a RemoteGroup resource offline and the target service group monitored by the RemoteGroup resource fails to go offline, the clean operation fails in certain situations.

 A RemoteGroup resource can monitor or manage a service group that exists in another cluster.
When an offline operation is initiated on a RemoteGroup resource or the service group containing this resource, the service group that is being monitored in the remote cluster is taken offline. The target service group may not go offline within the default timeout period, for example, if the OfflineWaitLimit attribute of the RemoteGroup resource is set to the default (0) value. In this case, the RemoteGroup agent considers the offline operation as failed and calls the clean agent function.

Solution

If the target group is likely to take a long time to go offline, increase the value of the OfflineWaitLimit attribute of the RemoteGroup resource. 
For example, if OfflineWaitLimit is 2 and MonitorInterval is 60, the RemoteGroup agent waits for a maximum of 120 seconds to confirm that the resource has gone offline. If you expect the resource to take 150 seconds to go offline, you must set OfflineWaitLimit to 3 or more.

The following examples show the main.cf files of the target cluster and host cluster, where the RemoteGroup resource exists.

On system sys1 (target cluster):
group G2(
              SystemList = {sys1 = 0}
)
FileOnOff fil1 (
                PathName = "/tmp/fil1"
                )
On system sys2 (host cluster):
group G1 (
              SystemList = {sys2 = 0}
)
RemoteGroup Remoteres (
                IpAddress = sys1
                Username = admin
                Password = DRJpGRg
                GroupName = G2
                VCSSysName = ANY
                ControlMode = OnOff
                )

To set the OfflineWaitLimit attribute:
1.       Set the configuration to read-write.
# haconf -makerw

2.       Set OfflineWaitLimit for the RemoteGroup resource in the host cluster at the resource type-level or override it at the resource level .

o   If you set the limit at resource type level, it applies to all RemoteGroup resources.
# hatype -modify RemoteGroup OfflineWaitLimit 2

o   Alternatively, you can override the attribute at the resource level.
# hares -override remoteres OfflineWaitLimit
# hares -modify remoteres OfflineWaitLimit 3

3.       Save the configuration.
# haconf -dump -makero 

No comments:

Post a Comment