Wednesday, November 9, 2011

AIX: enclosure0 and enclosure1 error

Encountered the following error, as reported by the AIX errpt utility

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
BD797922   1108110011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108110011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108100011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108100011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108093511 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108093511 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108070011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108070011 P H enclosure0     SUBSYSTEM FAILURE
BD797922   1108060011 P H enclosure1     SUBSYSTEM FAILURE
BD797922   1108060011 P H enclosure0     SUBSYSTEM FAILURE
AA8AB241   1108050111 T O OPERATOR       OPERATOR NOTIFICATION
AA8AB241   1108050111 T O OPERATOR       OPERATOR NOTIFICATION

Dig out more information on the error on enclosure0
# errpt -aj enclosure0
---------------------------------------------------------------------------
LABEL:          SSA_ENCL_ERR1
IDENTIFIER:     BD797922

Date/Time:       Tue Nov  8 11:00:30 2011
Sequence Number: 5643
Machine Id:      0055617A4C00
Node Id:         riju26
Class:           H
Type:            PERM
Resource Name:   enclosure0
Resource Class:  container
Resource Type:   ses
Location:        USSA33C8
VPD:
        Part Number.................9L1850
        Serial Number...............AC1433C8
        EC Level....................000000R000
        Manufacturer................IBM053
        ROS Level and ID............0020
        Device Specific.(Z0)........DISPLAY=33C8
        Device Specific.(Z1)........BYPASS1_16= 09L5510
        Device Specific.(Z2)........BYPASS4_5= 09L5510
        Device Specific.(Z3)........BYPASS8_9= 09L5510
        Device Specific.(Z4)........BYPASS12_13= 09L5510
        Device Specific.(Z5)........FAN1=09L2794
        Device Specific.(Z6)........FAN2=09L2794
        Device Specific.(Z7)........FAN3=09L2794
        Device Specific.(Z8)........PSU1=
        Device Specific.(Z9)........PSU2=09L4299
        Device Specific.(ZA)........CTRL= 34L3820
        Device Specific.(ZB)........OPERATOR= 08L7924

Description
SUBSYSTEM FAILURE

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0802 2100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------

And on enclosure1
---------------------------------------------------------------------------
LABEL:          SSA_ENCL_ERR1
IDENTIFIER:     BD797922

Date/Time:       Tue Nov  8 11:00:30 2011
Sequence Number: 5644
Machine Id:      0055617A4C00
Node Id:         riju26
Class:           H
Type:            PERM
Resource Name:   enclosure1
Resource Class:  container
Resource Type:   ses
Location:        USSA56E7
VPD:
        Part Number.................9L1850
        Serial Number...............292C56E7
        EC Level....................000000R000
        Manufacturer................IBM053
        ROS Level and ID............0020
        Device Specific.(Z0)........DISPLAY=56E7
        Device Specific.(Z1)........BYPASS1_16= 09L5580
        Device Specific.(Z2)........BYPASS4_5= 09L5580
        Device Specific.(Z3)........BYPASS8_9= 09L5580
        Device Specific.(Z4)........BYPASS12_13= 09L5580
        Device Specific.(Z5)........FAN1=09L2794
        Device Specific.(Z6)........FAN2=09L2794
        Device Specific.(Z7)........FAN3=09L2794
        Device Specific.(Z8)........PSU1=09L4299
        Device Specific.(Z9)........PSU2=
        Device Specific.(ZA)........CTRL= 27H0708
        Device Specific.(ZB)........OPERATOR= 08L7924

Description
SUBSYSTEM FAILURE

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0802 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------

And if you will notice that the lines for enclosure0
        Device Specific.(Z8)........PSU1=
and, for enclosure1
        Device Specific.(Z9)........PSU2=

It looks rather suspicious doesn't it, with my obvious lack of experience with AIX or IBM RS/6000 (7026-6H1), prompted me to look up the 7133 Models D40 and T40 Serial Disk Systems Service Guide. This took me a while to figure out as I wasn't sure which documentations to refer to, the IBM RS/6000 under 7026-6H1 or the IBM SSA 160 SerialRAID adapter. And at last, found what I was looking based on the SRN generated for enclosure0 and enclosure1.

# ssa_ela
enclosure0 SRN 80221
enclosure1 SRN 80222

Oh and, enclosure0 and enclosure1 mentioned are the disk enclosures, for this case.

Earlier on, I was mentioning something on the suspicious, PSU1 and PSU2 on enclosure0 and enclosure1 respectively. And apparently they're missing. It takes a lot of guess work to do perform hardware diagnostics remotely for a server that is halfway around the world.

# ssaencl -l enclosure0 -p
enclosure enclosure0
component PSU_1
present   FALSE

enclosure enclosure0
component PSU_2
present   TRUE
fault     FALSE
exchanged FALSE

# ssaencl -l enclosure1 -p
enclosure enclosure1
component PSU_1
present   TRUE
fault     FALSE
exchanged FALSE

enclosure enclosure1
component PSU_2
present   FALSE

No comments: