Concurrency Violation indicates that one of the resources from a non-parallel service group is online on more than one node within the cluster. When this happens, an error message will appear on the console, the messages/syslog.log file and VCS engine log. Here is an example from a cluster consisting of two nodes (rum and coke).
Similar error message in messages/syslog.log file and on console:
Nov 27 14:45:38 rum WARNING:: VCS Concurrency Violation!!! Group="group1" Hosts=(rum coke)
Similar error message in VCS engine log:
TAG_E 2000/11/27 14:45:38 Resource ip1 is online on rum
TAG_B 2000/11/27 14:45:38 CurrentCount increased above 1 for failover group
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
From the console or in the messages/syslog.log file, it provides the following:
1. The service group is group1.
2. The trouble hosts are rum and coke.
From the VCS engine log, it provides the group's "CurrentCount" number. The line immediately above the "CurrentCount" number shows the resource that is causing the concurrency violation. In this example, the resource is ip1, which happens to be an IP resource type.
hastatus -sum from coke before the concurrency violation took place:
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A coke RUNNING 0
A rum RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B group1 coke Y N ONLINE
B group1 rum Y N OFFLINE
# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 1634948 0 1634948 0 0 0
hme0 1500 coke coke 230705 0 79164 0 327 0
hme0:1 1500 10.10.8.0 10.10.10.2 0 0 0 0 0 0
hastatus -sum after the concurrency violation took place:
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A coke RUNNING 0
A rum RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B group1 coke Y N ONLINE
B group1 rum Y N PARTIAL
netstat -i from rum after the violation took place, showing the virtual ip 10.10.10.2 (resource ip1) is also active:
# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 2026601 0 2026601 0 0 0
hme0 1500 rum rum 1108676 0 1781672 0 91 0
hme0:1 1500 10.10.8.0 10.10.10.2 0 0 0 0 0 0
To fix this problem, ifconfig the virtual interface down and clear the resource on the partial system (rum). In this case, it is on rum and the virtual interface is hme0:1 with IP address of 10.10.10.2.
# ifconfig hme0:1 inet 0.0.0.0 down
Because the interface was taken offline outside of VCS, the resource state will change from online to faulted. To remove the faulted flag from the resource, execute the following hares command.
# hares -clear ip1
No comments:
Post a Comment