Saturday, November 26, 2011

Solaris: Useful ok prompt (OBP) commands


The boot process can be aborted, if you are at the terminal, the abort sequences are Stop-A, L1-A or Break (depending on keyboard type), which can be emulated over remote consoles (rsc, alom, ilom, xscf)
Entering the following commands would obviously have consequences, and they are:
boot - boots to multi user mode
boot cdrom - boots from cdrom
boot -r - reconfiguration boot
boot -a - interactive reboot
boot -s - single user mode

Solaris: init runlevels

Run levels are basically the operating mode that the Solaris OS will boot to after initiation.

Run levels that are available on the Solaris OS:
init 0 - power down, and drops to the OBP (Open Boot Prompt a.k.a. the ok> prompt)
init S/s - single user mode, used for maintenance or for troubleshooting
init 1 - administrative state, all file systems are mounted and local logins are allowed
init 2 - multi user, all daemons running except for nfs - nfs exports are not allowed
init 3 - multi user, this is where the Solaris OS goes live
init 4 - unused alternative multi users, unused unless defined
init 5 - power down - shuts down the OS and automatically power off if it is supported
init 6 - reboot

Wednesday, November 9, 2011

Solaris: Checking which process is bound to which TCP/UDP port number


On solaris, you'll need lsof, but it seems that lsof is not part of the Solaris default installables. It should be available on Sunfreeware

# /usr/local/bin/lsof -i:123
COMMAND PID USER   FD   TYPE        DEVICE SIZE/OFF NODE NAME
xntpd   335 root   19u  IPv4 0x600282a2c40      0t0  UDP *:ntp
xntpd   335 root   20u  IPv4 0x600282a2840      0t0  UDP localhost:ntp
xntpd   335 root   21u  IPv4 0x6002cb81ac0      0t0  UDP scglob20:ntp
xntpd   448 root   19u  IPv4 0x30016a2c800      0t0  UDP *:ntp
xntpd   448 root   20u  IPv4 0x6002f088500      0t0  UDP localhost:ntp
xntpd   448 root   21u  IPv4 0x30016a2ca00      0t0  UDP 161.19.6.146:ntp

Else, you can do the following, the pid the following command could return multiple results

# port=25
# for i in `ls /proc`; do
>    pfiles $i | grep AF_INET | grep "$port" 2>&1 >/dev/null; portfound=$?
>    if [ $portfound -eq 0 ]; then
>       echo "pid $i found to be bounded to port $port"
>    fi
>done
pid 17175 found
pid 2874 found
pid 29348 found
pid 4482 found
pid 463 found
pid 6696 found

This can be followed with, use the pid output from the earlier output
# ps -eo user,pid,comm | grep  
e.g.
# ps -eo user,pid,comm | grep 17175
    root 17175 /usr/lib/sendmail

AIX: Checking which process is bound to which TCP/UDP port number

Had no idea what so ever at all on how to do this at first, Linux has a very convenient way of doing this. Say if you want to find out which process is bound to port 8080.

# netstat -Aan | grep 8080
f1000200031b6b98 tcp4       0      0  *.8080             *.*                LISTEN

# rmsock f1000200031b6b98 tcpcb
The socket 0x31b6808 is being held by proccess 1306834 (httpd).

Solaris: Calculating swap utilization

In the case, where you'd have system monitoring in place and you need to find out how much swap space has been utilized. The way of calculating the percentage of swap space utilized on a Solaris system is:

% swap utilized
= (swap used / swap total) %

The swap total should be:

swap total = swap used + swap available

And when you subsitute the value from swap -s

# swap -s
total: 10527072k bytes allocated + 1647056k reserved = 12174128k used, 3572696k available

actual swap total
= 12174128k +  3572696k
=  15746824k

% swap utilized
= (12174128k/15746824k) * 100
= 77.31 % (rounded to the nearest two decimals points)

You can conveniently script this using bourne or bourne again shell.

swapinfo=`swap -s`

swapused=`echo $swapinfo | awk '{print $9}' | sed 's/k//'`
swapavail=`echo $swapinfo | awk '{print $11}' | sed 's/k//'`
swaptotal=`echo $swapused+$swapavail | bc`

swapusedpercent=`echo "scale=5; ($swapused/$swaptotal)*100" | bc -l`
echo "Swap utilization is at $swapusedpercent %"

AIX: enclosure0 and enclosure1 error

Encountered the following error, as reported by the AIX errpt utility

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
BD797922   1108110011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108110011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108100011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108100011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108093511 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108093511 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108070011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108070011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108060011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108060011 P H enclosure0     SUBSYSTEM FAILURE
AA8AB241   1108050111 T O OPERATOR       OPERATOR NOTIFICATION
AA8AB241   1108050111 T O OPERATOR       OPERATOR NOTIFICATION

Dig out more information on the error on enclosure0
# errpt -aj enclosure0
---------------------------------------------------------------------------
LABEL:          SSA_ENCL_ERR1
IDENTIFIER:     BD797922

Date/Time:       Tue Nov  8 11:00:30 2011
Sequence Number: 5643
Machine Id:      0055617A4C00
Node Id:         riju26
Class:           H
Type:            PERM
Resource Name:   enclosure0
Resource Class:  container
Resource Type:   ses
Location:        USSA33C8
VPD:
        Part Number.................9L1850
        Serial Number...............AC1433C8
        EC Level....................000000R000
        Manufacturer................IBM053
        ROS Level and ID............0020
        Device Specific.(Z0)........DISPLAY=33C8
        Device Specific.(Z1)........BYPASS1_16= 09L5510
        Device Specific.(Z2)........BYPASS4_5= 09L5510
        Device Specific.(Z3)........BYPASS8_9= 09L5510
        Device Specific.(Z4)........BYPASS12_13= 09L5510
        Device Specific.(Z5)........FAN1=09L2794
        Device Specific.(Z6)........FAN2=09L2794
        Device Specific.(Z7)........FAN3=09L2794
        Device Specific.(Z8)........PSU1=
        Device Specific.(Z9)........PSU2=09L4299
        Device Specific.(ZA)........CTRL= 34L3820
        Device Specific.(ZB)........OPERATOR= 08L7924

Description
SUBSYSTEM FAILURE

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0802 2100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------

And on enclosure1
---------------------------------------------------------------------------
LABEL:          SSA_ENCL_ERR1
IDENTIFIER:     BD797922

Date/Time:       Tue Nov  8 11:00:30 2011
Sequence Number: 5644
Machine Id:      0055617A4C00
Node Id:         riju26
Class:           H
Type:            PERM
Resource Name:   enclosure1
Resource Class:  container
Resource Type:   ses
Location:        USSA56E7
VPD:
        Part Number.................9L1850
        Serial Number...............292C56E7
        EC Level....................000000R000
        Manufacturer................IBM053
        ROS Level and ID............0020
        Device Specific.(Z0)........DISPLAY=56E7
        Device Specific.(Z1)........BYPASS1_16= 09L5580
        Device Specific.(Z2)........BYPASS4_5= 09L5580
        Device Specific.(Z3)........BYPASS8_9= 09L5580
        Device Specific.(Z4)........BYPASS12_13= 09L5580
        Device Specific.(Z5)........FAN1=09L2794
        Device Specific.(Z6)........FAN2=09L2794
        Device Specific.(Z7)........FAN3=09L2794
        Device Specific.(Z8)........PSU1=09L4299
        Device Specific.(Z9)........PSU2=
        Device Specific.(ZA)........CTRL= 27H0708
        Device Specific.(ZB)........OPERATOR= 08L7924

Description
SUBSYSTEM FAILURE

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0802 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------

And if you will notice that the lines for enclosure0
        Device Specific.(Z8)........PSU1=
and, for enclosure1
        Device Specific.(Z9)........PSU2=

It looks rather suspicious doesn't it, with my obvious lack of experience with AIX or IBM RS/6000 (7026-6H1), prompted me to look up the 7133 Models D40 and T40 Serial Disk Systems Service Guide. This took me a while to figure out as I wasn't sure which documentations to refer to, the IBM RS/6000 under 7026-6H1 or the IBM SSA 160 SerialRAID adapter. And at last, found what I was looking based on the SRN generated for enclosure0 and enclosure1.

# ssa_ela
enclosure0 SRN 80221
enclosure1 SRN 80222

Oh and, enclosure0 and enclosure1 mentioned are the disk enclosures, for this case.

Earlier on, I was mentioning something on the suspicious, PSU1 and PSU2 on enclosure0 and enclosure1 respectively. And apparently they're missing. It takes a lot of guess work to do perform hardware diagnostics remotely for a server that is halfway around the world.

# ssaencl -l enclosure0 -p
enclosure enclosure0
component PSU_1
present   FALSE

enclosure enclosure0
component PSU_2
present   TRUE
fault     FALSE
exchanged FALSE

# ssaencl -l enclosure1 -p
enclosure enclosure1
component PSU_1
present   TRUE
fault     FALSE
exchanged FALSE

enclosure enclosure1
component PSU_2
present   FALSE