Sunday, November 13, 2016

Resource is causing the "Concurrency Violation" in a VCS cluster


Concurrency Violation indicates that one of the resources from a non-parallel service group is online on more than one node within the cluster.  When this happens,  an error message will appear on the console, the messages/syslog.log file and VCS engine log. Here is an example from a cluster consisting of two nodes (rum and coke).

Similar error message in messages/syslog.log file and on console:

Nov 27 14:45:38 rum WARNING:: VCS Concurrency Violation!!! Group="group1" Hosts=(rum coke)

Similar error message in VCS engine log:

TAG_E 2000/11/27 14:45:38 Resource ip1 is online on rum
TAG_B 2000/11/27 14:45:38 CurrentCount increased above 1 for failover group
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From the console or in the messages/syslog.log file, it provides the following:

1. The service group is group1.
2. The trouble hosts are rum and coke.

From the VCS engine log, it provides the group's "CurrentCount" number.  The line immediately above the "CurrentCount" number shows the resource that is causing the concurrency violation.  In this example, the resource is ip1, which happens to be an IP resource type.


hastatus -sum from coke before the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen              

A  coke                 RUNNING              0                    
A  rum                  RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          

B  group1          coke                 Y          N               ONLINE         
B  group1          rum                  Y          N               OFFLINE        

# netstat -i
Name     Mtu  Net/Dest         Address        Ipkts       Ierrs   Opkts     Oerrs   Collis   Queue 
lo0         8232   loopback      localhost      1634948     0     1634948     0         0        0     
hme0    1500    coke            coke             230705      0      79164        0     327        0     
hme0:1 1500    10.10.8.0     10.10.10.2              0      0               0      0         0        0 


hastatus -sum after the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen              

A  coke                 RUNNING              0                    
A  rum                  RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          

B  group1          coke                 Y          N               ONLINE         
B  group1          rum                  Y          N               PARTIAL        


netstat -i from rum after the violation took place, showing the virtual ip 10.10.10.2 (resource ip1) is also active:

# netstat -i
Name       Mtu      Net/Dest      Address        Ipkts         Ierrs      Opkts     Oerrs     Collis     Queue 
lo0           8232     loopback      localhost      2026601      0        2026601      0          0            0     
hme0       1500     rum              rum             1108676       0       1781672      0         91            0     
hme0:1     1500    10.10.8.0     10.10.10.2               0       0                  0      0          0            0 

To fix this problem, ifconfig the virtual interface down and clear the resource on the partial system (rum).  In this case, it is on rum and the virtual interface is hme0:1 with IP address of 10.10.10.2.

# ifconfig  hme0:1  inet  0.0.0.0  down

Because the interface was taken offline outside of VCS, the resource state will change from online to faulted. To remove the faulted flag from the resource, execute the following hares command.

# hares  -clear  ip1

VCS Concurrency Violation brief note.

Concurrency Violation brief note mentioned based on the real scenario.

engineA.log are,



2014/12/03 06:46:54 VCS INFO V-16-1-10299 Resource App_saposcol (Owner: Unspecified, Group: sapgtsprd) is online on mapibm625 (Not initiated by VCS)
2014/12/03 06:46:54 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sapgtsprd
2014/12/03 06:46:54 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group sapgtsprd on all nodes
2014/12/03 06:46:55 VCS WARNING V-16-6-15034 (mapibm625) violation:-Offlining group sapgtsprd on system mapibm625
2014/12/03 06:46:55 VCS INFO V-16-1-50135 User root fired command: hagrp -offline sapgtsprd  mapibm625  from localhost
2014/12/03 06:46:55 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625

Explanation
=============

Most service groups are of type failover (the default) meaning it should only run on one system at any one time and it can fail over to another system, so if VCS detects a failover group is running on more than one node then it reports a Concurrency Violation and then tries to offline on the node that VCS did not online it on.

So in your situation resource App_saposcol in group sapgtsprd was onlined on mapibm625 outside of VCS control when it was already online on another system in the cluster so VCS offlined it:

2014/12/03 06:46:55 VCS NOTICE V-16-1-10167 Initiating manual offline of group sapgtsprd on system mapibm625

If group should run on 2 systems at the same time then you should configure group as Parallel (set group attribute Parallel = 1) and then group will run on all systems in the cluster at the same time.

Credit to https://www.veritas.com

How to Update UDID in vxvm

Below are the high level steps for clearing the UUID mis patch in Vxvm.

1 . Stop any file system access.

2. Deport the disk groups

# vxdg deport <diskgroup>

3. Clear the locks on the disk, using device name:

# vxdisk clearimport <disk_access> ...

4. Change / update the clear udid on the disk.

# vxdisk updateudid <disk_access> ...

5. Import diskgroup again.

# vxdg -o updateid import <diskgroup>

6. Check again 

# vxdisk -o alldgs list

How to create a file system using VXvm, controlled under VERITAS Cluster Server


Following is the algorithm to create a volume, file system and put them under VERITAS Cluster Server (VCS).

1. Create a disk group
2. Create a mount point and file system
3. Deport a disk group
4. Create a service group


Add following resources and modify attributes:

Resources Name Attributes
1. Disk group, disk group name
2. Mount block device, FSType, MountPoint

Create dependency between following resources:

1. Mount and disk group

Enable all resources in this service group.

The following example shows how to create a raid-5 volume with a VxFS file system and put it under VCS control.

Method 1 - Using the command line

1. Create a disk group using Volume Manager with a minimum of 4 disks:

# vxdg init datadg disk01=c1t1d0s2 disk02=c1t2d0s2 disk03=c1t3d0s2 disk04=c1t4d0s2
# vxassist -g datadg make vol01 2g layout=raid5

2. Create a mount point for this volume:

# mkdir /vol01

3. Create a file system on this volume:

# mkfs -F vxfs /dev/vx/rdsk/datadg/vol01

4. Deport this disk group:

# vxdg deport datadg

5. Create a service group:

# haconf -makerw
# hagrp -add newgroup
# hagrp -modify newgroup SystemList <sysa> 0 <sysb> 1
# hagrp -modify newgroup AutoStartList <sysa>

6. Create a disk group resource and modify its attributes:

# hares -add data_dg DiskGroup newgroup
# hares -modify data_dg DiskGroup datadg

7. Create a mount resource and modify its attributes:

# hares -add vol01_mnt Mount newgroup
# hares -modify vol01_mnt BlockDevice /dev/vx/dsk/datadg/vol01
# hares -modify vol01_mnt FSType vxfs
# hares -modify vol01_mnt MountPoint /vol01
# hares -modify vol01_mnt FsckOpt %-y

8. Link the mount resource to the disk group resource:

# hares -link vol01_mnt data_dg

9. Enable the resources and close the configuration:

# hagrp -enableresources newgroup
# haconf -dump -makero



Method 2 - Editing /etc/VRTSvcs/conf/config/main.cf

# hastop -all
# cd /etc/VRTSvcs/conf/config
# haconf -makerw
# vi main.cf


Add the following line to end of this file:

group newgroup (
SystemList = { sysA =0, sysB=1}
AutoStartList = { sysA }
)

DiskGroup data_dg (
DiskGroup = datadg
)

Mount vol01_mnt (
MountPoint = "/vol01"
BlockDevice = " /dev/vx/dsk/datadg/vol01"
FSType = vxfs
)

vol01_mnt requires data_dg


# haconf -dump -makero
# hastart -local

How to add NIC/IP resource in VCS

Here's an actual server I worked on for adding a NIC resource.

# haconf -makerw


# hares -add vvrnic NIC db2inst_grp

VCS NOTICE V-16-1-10242 Resource added. Enabled attribute must be set before agent monitors

# hares -modify vvrnic Device ce3

# hares -modify vvrnic NetworkType ether

# hares -add vvrip IP db2inst_grp

VCS NOTICE V-16-1-10242 Resource added. Enabled attribute must be set before agent monitors

# hares -modify vvrip Device ce3
# hares -modify vvrip Address "10.67.196.191"
# hares -modify vvrip NetMask "255.255.254.0"


# hares -link vvrip vvrnic


# hagrp -enableresources db2inst_grp


# hares -online vvrip -sys server620


# haconf -dump -makero

Veritas Cluster File System (CFS)

Veritas Cluster File System (CFS)




CFS allows the same file system to be simultaneously mounted on multiple nodes in the cluster.

The CFS is designed with master/slave architecture. Though any node can initiate an operation to create, delete, or resize data, the master node carries out the actual operation. CFS caches the metadata in memory, typically in the memory buffer cache or the vnode cache. A distributed locking mechanism, called GLM, is used for metadata and cache coherency among the multiple nodes.

The examples here are :

1. Based on VCS 5.x but should also work on 4.x
2. A new 4 node cluster with no resources defined.
3. Diskgroups and volumes will be created and shared across all nodes.

Before you configure CFS

1. Make sure you have an established Cluster and running properly.
2. Make sure these packages are installed on all nodes:

VRTScavf Veritas cfs and cvm agents by Symantec
VRTSglm Veritas LOCK MGR by Symantec

3. Make sure you have a license installed for Veritas CFS on all nodes.
4. Make sure vxfencing driver is active on all nodes (even if it is in disabled mode).

Check the status of the cluster

Here are some ways to check the status of your cluster. On these examples, CVM/CFS are not configured yet.


# cfscluster status
  NODE         CLUSTER MANAGER STATE            CVM STATE
serverA        running                        not-running                   
serverB        running                        not-running                   
serverC        running                        not-running                   
serverD        running                        not-running                   

  Error: V-35-41: Cluster not configured for data sharing application

# vxdctl -c mode
mode: enabled: cluster inactive

# /etc/vx/bin/vxclustadm nidmap
Out of cluster: No mapping information available

# /etc/vx/bin/vxclustadm -v nodestate
state: out of cluster

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen             

A  serverA             RUNNING              0                   
A  serverB             RUNNING              0                   
A  serverC             RUNNING              0                   
A  serverD             RUNNING              0


Configure the cluster for CFS

During configuration, veritas will pick up all information that is set on your cluster configuration. And will activate CVM on all the nodes.


# cfscluster config
 
        The cluster configuration information as read from cluster
        configuration file is as follows.
                Cluster : MyCluster
                Nodes   : serverA serverB serverC serverD

 
        You will now be prompted to enter the information pertaining
        to the cluster and the individual nodes.
 
        Specify whether you would like to use GAB messaging or TCP/UDP
        messaging. If you choose gab messaging then you will not have
        to configure IP addresses. Otherwise you will have to provide
        IP addresses for all the nodes in the cluster.
  
        ------- Following is the summary of the information: ------
                Cluster         : MyCluster
                Nodes           : serverA serverB serverC serverD
                Transport       : gab
        -----------------------------------------------------------

 
        Waiting for the new configuration to be added.

        ========================================================

        Cluster File System Configuration is in progress...
        cfscluster: CFS Cluster Configured Successfully


Check the status of the cluster

Now let's check the status of the cluster. And notice that there is now a new service group cvm. CVM is required to be online before we can bring up any clustered filesystem on the nodes.


# cfscluster status

  Node             :  serverA
  Cluster Manager  :  running
  CVM state        :  running
  No mount point registered with cluster configuration


  Node             :  serverB
  Cluster Manager  :  running
  CVM state        :  running
  No mount point registered with cluster configuration


  Node             :  serverC
  Cluster Manager  :  running
  CVM state        :  running
  No mount point registered with cluster configuration


  Node             :  serverD
  Cluster Manager  :  running
  CVM state        :  running
  No mount point registered with cluster configuration

# vxdctl -c mode
mode: enabled: cluster active - MASTER
master: serverA

# /etc/vx/bin/vxclustadm nidmap
Name                             CVM Nid    CM Nid     State
serverA                         0          0          Joined: Master
serverB                         1          1          Joined: Slave
serverC                         2          2          Joined: Slave
serverD                         3          3          Joined: Slave

# /etc/vx/bin/vxclustadm -v nodestate
state: cluster member
        nodeId=0
        masterId=1
        neighborId=1
        members=0xf
        joiners=0x0
        leavers=0x0
        reconfig_seqnum=0xf0a810
        vxfen=off

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen             

A  serverA             RUNNING              0                   
A  serverB             RUNNING              0                   
A  serverC             RUNNING              0                   
A  serverD             RUNNING              0                   

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         

B  cvm             serverA             Y          N               ONLINE        
B  cvm             serverB             Y          N               ONLINE        
B  cvm             serverC             Y          N               ONLINE        
B  cvm             serverD             Y          N               ONLINE





Creating a Shared Disk Group and Volumes/Filesystems

This procedure creates a shared disk group for use in a cluster environment. Disks must be placed in disk groups before they can be used by the Volume Manager.

When you place a disk under Volume Manager control, the disk is initialized. Initialization destroys any existing data on the disk.

Before you begin, make sure the disks that you add to the shared-disk group must be directly attached to all the cluster nodes.

First, make sure you are on the master node:


serverA # vxdctl -c mode
mode: enabled: cluster active - MASTER
master: serverA


Initialize the disks you want to use. Make sure they are attached to all the cluster nodes. You may optionally specify the disk format.


serverA # vxdisksetup -if EMC0_1 format=cdsdisk
serverA # vxdisksetup -if EMC0_2 format=cdsdisk


Create a shared disk group with the disks you just initialized.


serverA # vxdg -s init mysharedg mysharedg01=EMC0_1 mysharedg02=EMC0_2

serverA # vxdg list
mysharedg    enabled,shared,cds   1231954112.163.serverA


Now let's add that new disk group in our cluster configuration. Giving all nodes in the cluster an option for Shared Write (sw).


serverA # cfsdgadm add mysharedg all=sw
  Disk Group is being added to cluster configuration...


Verify that the cluster configuration has been updated.


serverA # grep mysharedg /etc/VRTSvcs/conf/config/main.cf
                ActivationMode @serverA = { mysharedg = sw }
                ActivationMode @serverB = { mysharedg = sw }
                ActivationMode @serverC = { mysharedg = sw }
                ActivationMode @serverD = { mysharedg = sw }

serverA # cfsdgadm display
  Node Name : serverA
  DISK GROUP              ACTIVATION MODE
    mysharedg                    sw

  Node Name : serverB
  DISK GROUP              ACTIVATION MODE
    mysharedg                    sw

  Node Name : serverC
  DISK GROUP              ACTIVATION MODE
    mysharedg                    sw

  Node Name : serverD
  DISK GROUP              ACTIVATION MODE
    mysharedg                    sw


We can now create volumes and filesystems within the shared diskgroup.


serverA # vxassist -g mysharedg make mysharevol1 100g
serverA # vxassist -g mysharedg make mysharevol2 100g

serverA # mkfs -F vxfs /dev/vx/rdsk/mysharedg/mysharevol1
serverA # mkfs -F vxfs /dev/vx/rdsk/mysharedg/mysharevol2


Then add these volumes/filesystems to the cluster configuration so they can be mounted on any or all nodes. Mountpoints will be automatically created.


serverA # cfsmntadm add mysharedg mysharevol1 /mountpoint1
  Mount Point is being added...
  /mountpoint1 added to the cluster-configuration

serverA # cfsmntadm add mysharedg mysharevol2 /mountpoint2
  Mount Point is being added...
  /mountpoint2 added to the cluster-configuration


Display the CFS mount configurations in the cluster.


serverA # cfsmntadm display -v
  Cluster Configuration for Node: apqma519
  MOUNT POINT        TYPE      SHARED VOLUME     DISK GROUP       STATUS        MOUNT OPTIONS
  /mountpoint1    Regular      mysharevol1       mysharedg        NOT MOUNTED   crw
  /mountpoint2    Regular      mysharevol2       mysharedg        NOT MOUNTED   crw



That's it. Check you cluster configuration and try to ONLINE the filesystems on your nodes.


serverA # hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen             

A  serverA             RUNNING              0                   
A  serverB             RUNNING              0                   
A  serverC             RUNNING              0                   
A  serverD             RUNNING              0                   

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         

B  cvm             serverA             Y          N               ONLINE        
B  cvm             serverB             Y          N               ONLINE        
B  cvm             serverC             Y          N               ONLINE        
B  cvm             serverD             Y          N               ONLINE
B  vrts_vea_cfs_int_cfsmount1 serverA             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount1 serverB             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount1 serverC             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount1 serverD             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount2 serverA             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount2 serverB             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount2 serverC             Y          N               OFFLINE
B  vrts_vea_cfs_int_cfsmount2 serverD             Y          N               OFFLINE



Each volume will have its own Service group and looks really ugly, so you may want to modify your main.cf file and group them. Be creative!

How to configure VCS service groups for Oracle RAC



This section describes how to configure the Oracle service group using the CLI.

The following procedure assumes that you have created the database.

To configure the Oracle service group using the CLI

Change the cluster configuration to read-write mode:

# haconf -makerw

Add the service group to the VCS configuration:

# hagrp -add oradb1_grp

Modify the attributes of the service group:

# hagrp -modify oradb1_grp Parallel 1
# hagrp -modify oradb1_grp SystemList galaxy 0  nebula 1
# hagrp -modify oradb1_grp AutoStartList galaxy nebula

Add the CVMVolDg resource for the service group:

# hares -add oradata_voldg CVMVolDg oradb1_grp

Modify the attributes of the CVMVolDg resource for the service group:

# hares -modify oradata_voldg CVMDiskGroup oradatadg
# hares -modify oradata_voldg CVMActivation sw
# hares -modify oradata_voldg CVMVolume oradatavol

Add the CFSMount resource for the service group:

# hares -add oradata_mnt CFSMount oradb1_grp

Modify the attributes of the CFSMount resource for the service group:

# hares -modify oradata_mnt MountPoint "/oradata"
# hares -modify oradata_mnt BlockDevice \
"/dev/vx/dsk/oradatadg/oradatavol"

Add the Oracle RAC database instance to the service group:

# hares -add ora1 Oracle oradb1_grp

Modify the attributes of the Oracle resource for the service group:

# hares -modify ora1 Owner oracle
# hares -modify ora1 Home "/app/oracle/orahome"
# hares -modify ora1 StartUpOpt SRVCTLSTART
# hares -modify ora1 ShutDownOpt SRVCTLSTOP

Localize the Sid attribute for the Oracle resource:

# hares -local ora1 Sid

Set the Sid attributes for the Oracle resource on each system:

# hares -modify ora1 Sid vrts1 -sys galaxy
# hares -modify ora1 Sid vrts2 -sys nebula

Set the dependencies between the CFSMount resource and the CVMVolDg resource for the Oracle service group:

# hares -link oradata_mnt oradata_voldg

Set the dependencies between the Oracle resource and the CFSMount resource for the Oracle service group:

# hares -link ora1 oradata_mnt

Create an online local firm dependency between the oradb1_grp service group and the cvm service group:

# hagrp -link oradb1_grp cvm online local firm

Enable the Oracle service group:

# hagrp -enableresources oradb1_grp

Change the cluster configuration to the read-only mode:

# haconf -dump -makero

Bring the Oracle service group online on all the nodes:

# hagrp -online oradb1_grp -any

Friday, November 11, 2016

How to add a low or high priority heartbeat to the cluster configuration without affecting online VCS resources.




The issue:

An additional heartbeat link is desired for additional redundancy of the cluster heartbeats.



To resolve the issue:

The Low Latency Transport (LLT) reads the /etc/llttab file when started.

Upon LLT startup, the proper devices for Low Latency Transport to monitor are loaded from the /etc/llttab file.

It is possible to change the devices that Low Latency transport monitors while it is active.

To add a new high priority link (private heartbeat) while Low Latency transport is active, use the following command on each node: 

Syntax-

lltconfig -t <alias> -d <device> -b ether



For example-

It is desired to add a private heartbeat link on the interface qfe:4.

The link may be added with the following command on each node.

# lltconfig -t qfe4 -d /dev/qfe:4 -b ether



To add a new low priority link (utilizing a public interface), use the following command on each node:

Syntax-

lltconfig -t <alias> -l -d <device> -b ether



Example using qfe4 again for a low-pri link:

# lltconfig -t qfe4 -l -d /dev/qfe:4 -b ether

Note the "-l" (lowercase letter 'L') after <alias> is the parameter to define the new heartbeat connection as a low-priority connection.

Note also, that "low-priority" connections are not recommended for use in Oracle RAC clusters for data security reasons.



The commands above add a link to the running configuration.

An additional change to the file "/etc/llttab" must be made on each node to make the change a permanent change to the configuration of the cluster.

The process for the permanent change will be explained later in this document.



Once the heartbeat is added, the new link configuration can be verified with the following command: 

# lltstat -vvn 

The newly configured link will show in the output. 



Additionally, to determine / verify whether links are "high priority" (private), or "low-priority" (public), use the following command:

# lltstat -l |grep link

Example output:

# lltstat -l |grep link

link 0 qfe2 on etherfp hipri

link 1 qfe3 on etherfp hipri

link 2 qfe4 on etherfp lowpri



The process to make the change to the cluster configuration permanent follows.

Here is an example /etc/llttab configuration with two Private Heartbeat Links.

# cat /etc/llttab

set-node testnode1

set-cluster 100

link qfe2 /dev/qfe:2 - ether - -

link qfe3 /dev/qfe:3 - ether - -



Use "vi" or a similar editor to perform a manual change to the "/etc/llttab" file on each node.  

If a high priority (private) link on qfe4 is desired, an additional line is added to the end of the links listed in the file, as shown below: 

Again, the example uses qfe4.

# cat /etc/llttab

set-node testnode1

set-cluster 100

link qfe2 /dev/qfe:2 - ether - -

link qfe3 /dev/qfe:3 - ether - -

link qfe4 /dev/qfe:4 - ether - -



If a low priority (public) link on qfe4 is desired, a line would be added as follows in this example:

# cat /etc/llttab

set-node testnode1

set-cluster 100

link qfe2 /dev/qfe:2 - ether - -

link qfe3 /dev/qfe:3 - ether - -

link-lowpri qfe4 /dev/qfe:4 - ether - -

Note the "low-pri" designation for this link.



Upon reboot of the system, the VCS configuration will read the additional link that was added to "/etc/llttab" in either of the above steps.

The change to the link configuration is now permanent.

Credit to :-https://www.veritas.com/ 

How to add new NODE to existing cluster



Here are the steps for  adding the node to existing cluster.

Install the  VCS software manually  on server check the  packages  and install the license key.

1. ./Installvcs -installonly

Configuring LLT and GAB: Create the LLT and GAB configuration files on the new node and update the files on the existing nodes.
To configure LLT 1 Create the file /etc/llthosts on the new node. You must also update it on each of the current nodes in the cluster.
For example, suppose you are adding east to a cluster consisting of north and south:

-  If the file on one of the existing nodes resembles: file /etc/llthosts 

0 north
1 south

- Update the file for all nodes, including the new one, resembling: file /etc/llthosts 

0 north
1 south
2 east

2.  Create the file /etc/llttab on the new node, making sure that line beginning "set-node" specifies the new node.
The file /etc/llttab on an existing node can serve as a guide.
The following example describes a system where node east is the new node on cluster number 2:

set-node east
set-cluster 2
link qfe0 qfe:0 - ether - link
qfe1 qfe:1 - ether - 3

On the new system, run the command:

# /sbin/lltconfig -c

To configure GAB

1 Create the file /etc/gabtab on the new system.

-  If the /etc/gabtab file on the existing nodes resembles:
/sbin/gabconfig -c

Then the file on the new node should be the same, although it is recommended to use the -c-nN option, where "N" is the number of cluster nodes.

- If the /etc/gabtab file on the existing nodes resembles:
/sbin/gabconfig -c -n2

Then, the file on all nodes, including the new node, should change to reflect the change in the number of cluster nodes.
For example, the new file on each node should resemble:
/sbin/gabconfig -c -n3

The -n flag indicates to VCS the number of nodes required to be ready to form a cluster before VCS starts.
2 On the new node, run the command, to configure GAB:

# /sbin/gabconfig -c

To verify GAB
1 On the new node, run the command:

# /sbin/gabconfig -a

The output should indicate that Port a membership shows all nodes including the new node. The output should resemble:
GAB Port Memberships
====================================
Port a gen a3640003 membership 012


2 Run the same command on the other nodes (north and south) to verify that the Port "a" membership includes the new node:

# /sbin/gabconfig -a
GAB Port Memberships
====================================
Port a gen a3640003 membership 012
Port h gen fd570002 membership 01
Port h gen fd570002 visible ; 2


** Adding the node to the existing cluster:
Perform these tasks on one of the existing nodes in the cluster.

To add the new node to the existing cluster;

1. Enter the command:

# haconf -makerw

2. Add the new system to the cluster:

# hasys -add east

3 Stop VCS on the new node:

# hastop -sys east

4 Copy the main.cf file from an existing node to your new node:
rcp /etc/VRTSvcs/conf/config/main.cf east:/etc/VRTSvcs/conf/config/

5 Start VCS on the new node:
# hastart

6 If necessary, modify any new system attributes.

7 Enter the command:
# haconf -dump -makero

Starting VCS and verifying the cluster
Start VCS after adding the new node to the cluster and verify the cluster.

To start VCS and verify the cluster
1 From the new system, start VCS with the new system added to the cluster:
# hastart

2 Run the GAB configuration command on each node to verify that Port "a" and Port "h" include the new node in the membership:
# /sbin/gabconfig -a
GAB Port Memberships
===================================
Port a gen a3640003 membership 012
Port h gen fd570002 membership 012

How to use lsof in Solaris local zones


In Solaris local zone lsof will not work.By default all the application should have configured with specific port number.But sometimes your application will not start if that port has been already occupied by another application process.In this situation, you will get error like “unable to bind port <port-number>”if you try to start the application. Traditionally we will use lsof command to determine what process is using the particular port.But in Solaris 10 local zones , you can’t use lsof. The below script will help you to find what process is occupying the specific port on Solaris servers and this script will be very useful on Solaris zones.

Script :

#!/bin/ksh
line='---------------------------------------------'
pids=$(/usr/bin/ps -ef | sed 1d | awk '{print $2}')

if [ $# -eq 0 ]; then
   read ans?"Enter port you would like to know pid for: "
else
   ans=$1
fi

for f in $pids
do
   /usr/proc/bin/pfiles $f 2>/dev/null | /usr/xpg4/bin/grep -q "port: $ans"
   if [ $? -eq 0 ]; then
      echo $line
      echo "Port: $ans is being used by PID:\c"
      /usr/bin/ps -ef -o pid -o args | egrep -v "grep|pfiles" | grep $f
   fi
done
exit 0

How to use the script ?

1.Copy the script to Solaris local zones and make the script as executable.
# chmod 555 check_port.sh
# ls -lrt check_port.sh
-r-xr-xr-x   1 root     root         517 Jan 29 10:11 check_port.sh
#

2.Execute the script  and enter the port number.

# ./check_port.sh
Enter port you would like to know pid for: 8232
---------------------------------------------
Port: 8232 is being used by PID:12864 /usrweb/exe/jlaunch pf=/usr/web/SYS/profile/mydb -SD
---------------------------------------------
#
#

3.Once you got the process ID ,then its easy to trace it about that process.

# ps -ef |grep 12864
  ora1adm 12864  327   0   Jan 19 ?          25:51 /usrweb/exe/jlaunch pf=/usr/web/SYS/profile/mydb -SD
    root 29676 17719   0 10:13:34 pts/49      0:00 grep 12864
#

You can check with the ora1adm user and ask him to stop the process if you want to use the port 8232 for other applications.If the user is not able to stop this process,you can kill the process using pkill pid or kill -9 pid commands with proper approvals. 

Wednesday, November 9, 2016

Unable to offline RemoteGroup resource cleanly


When VCS tries to take a RemoteGroup resource offline and the target service group monitored by the RemoteGroup resource fails to go offline, the clean operation fails in certain situations.

 A RemoteGroup resource can monitor or manage a service group that exists in another cluster.
When an offline operation is initiated on a RemoteGroup resource or the service group containing this resource, the service group that is being monitored in the remote cluster is taken offline. The target service group may not go offline within the default timeout period, for example, if the OfflineWaitLimit attribute of the RemoteGroup resource is set to the default (0) value. In this case, the RemoteGroup agent considers the offline operation as failed and calls the clean agent function.

Solution

If the target group is likely to take a long time to go offline, increase the value of the OfflineWaitLimit attribute of the RemoteGroup resource. 
For example, if OfflineWaitLimit is 2 and MonitorInterval is 60, the RemoteGroup agent waits for a maximum of 120 seconds to confirm that the resource has gone offline. If you expect the resource to take 150 seconds to go offline, you must set OfflineWaitLimit to 3 or more.

The following examples show the main.cf files of the target cluster and host cluster, where the RemoteGroup resource exists.

On system sys1 (target cluster):
group G2(
              SystemList = {sys1 = 0}
)
FileOnOff fil1 (
                PathName = "/tmp/fil1"
                )
On system sys2 (host cluster):
group G1 (
              SystemList = {sys2 = 0}
)
RemoteGroup Remoteres (
                IpAddress = sys1
                Username = admin
                Password = DRJpGRg
                GroupName = G2
                VCSSysName = ANY
                ControlMode = OnOff
                )

To set the OfflineWaitLimit attribute:
1.       Set the configuration to read-write.
# haconf -makerw

2.       Set OfflineWaitLimit for the RemoteGroup resource in the host cluster at the resource type-level or override it at the resource level .

o   If you set the limit at resource type level, it applies to all RemoteGroup resources.
# hatype -modify RemoteGroup OfflineWaitLimit 2

o   Alternatively, you can override the attribute at the resource level.
# hares -override remoteres OfflineWaitLimit
# hares -modify remoteres OfflineWaitLimit 3

3.       Save the configuration.
# haconf -dump -makero