Sunday, November 13, 2016

Resource is causing the "Concurrency Violation" in a VCS cluster


Concurrency Violation indicates that one of the resources from a non-parallel service group is online on more than one node within the cluster.  When this happens,  an error message will appear on the console, the messages/syslog.log file and VCS engine log. Here is an example from a cluster consisting of two nodes (rum and coke).

Similar error message in messages/syslog.log file and on console:

Nov 27 14:45:38 rum WARNING:: VCS Concurrency Violation!!! Group="group1" Hosts=(rum coke)

Similar error message in VCS engine log:

TAG_E 2000/11/27 14:45:38 Resource ip1 is online on rum
TAG_B 2000/11/27 14:45:38 CurrentCount increased above 1 for failover group
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From the console or in the messages/syslog.log file, it provides the following:

1. The service group is group1.
2. The trouble hosts are rum and coke.

From the VCS engine log, it provides the group's "CurrentCount" number.  The line immediately above the "CurrentCount" number shows the resource that is causing the concurrency violation.  In this example, the resource is ip1, which happens to be an IP resource type.


hastatus -sum from coke before the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen              

A  coke                 RUNNING              0                    
A  rum                  RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          

B  group1          coke                 Y          N               ONLINE         
B  group1          rum                  Y          N               OFFLINE        

# netstat -i
Name     Mtu  Net/Dest         Address        Ipkts       Ierrs   Opkts     Oerrs   Collis   Queue 
lo0         8232   loopback      localhost      1634948     0     1634948     0         0        0     
hme0    1500    coke            coke             230705      0      79164        0     327        0     
hme0:1 1500    10.10.8.0     10.10.10.2              0      0               0      0         0        0 


hastatus -sum after the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen              

A  coke                 RUNNING              0                    
A  rum                  RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          

B  group1          coke                 Y          N               ONLINE         
B  group1          rum                  Y          N               PARTIAL        


netstat -i from rum after the violation took place, showing the virtual ip 10.10.10.2 (resource ip1) is also active:

# netstat -i
Name       Mtu      Net/Dest      Address        Ipkts         Ierrs      Opkts     Oerrs     Collis     Queue 
lo0           8232     loopback      localhost      2026601      0        2026601      0          0            0     
hme0       1500     rum              rum             1108676       0       1781672      0         91            0     
hme0:1     1500    10.10.8.0     10.10.10.2               0       0                  0      0          0            0 

To fix this problem, ifconfig the virtual interface down and clear the resource on the partial system (rum).  In this case, it is on rum and the virtual interface is hme0:1 with IP address of 10.10.10.2.

# ifconfig  hme0:1  inet  0.0.0.0  down

Because the interface was taken offline outside of VCS, the resource state will change from online to faulted. To remove the faulted flag from the resource, execute the following hares command.

# hares  -clear  ip1

No comments:

Post a Comment