This 'ere is a little something I picked up while trying to attend one of Oracle's Field Change Order for the Oracle Database Appliance, where I was tasked to replace 10 600GB SAS hard disk drives.
As it turns out the disks are being managed by Oracle ASM. Never in the world have I worked on this before and before you know it I was googling away.
Basically the only information I had was the serial number of the Oracle Database Appliance, of which I will refer to as ODA henceforth, the number of disks that are to be replaced, the serial numbers, and well that's pretty much it.
Armed with some knowledge, the equipment model and make, plus the disks (model and make too), I asked the chap, well the customer who was tasked by his superiors to look after this ODA. A really nice Malay chap. Yeah, and so he had first helped me out by pulling out one of those KVM consoles that is used to access to the ODA. And once he logged in, oh well, I can see that it's running on Oracle's OEL 5 update 7 to be precise.
The first thing I had to figure out what, I need to match which disks corresponds physically with the list of serial numbers I had in hand. As it turns out the ODA had a nifty too used to interact with the OAK. OAK, as of writing is the Oracle Appliance Manager. So far I only know it handles all the nitty gritty disks stuffs. It manages disks that is discovered by OS, invokes gparted (apparently) and give name to the disks according to the enclosure and slot numbers. Pretty neat eh?
Alright, so in order to find out which disk corresponds to which serial number, I invoked:
grid@cs-oa1 $ /u01/app/oracle/grid/bin/oak show disk | awk '{print $1}'
pd_00
pd_01
...
..
pd_23
This particular ODA has 2 slots for two SATA disks, which is basically installed with Oracle's OEL, this is located at the back of the ODA. While at the front, there are 24 slots that has been designated slot where the data disks would be.
To find out what the details of the disk, such as the status, the disk name, the multipath name, the current active disk name, etc. I had invoked the following:
grid@cs-oda1 $ oakcli show disk pd_00 | egrep -i "diskid|multipathlist|serial|slotnum|sate|prevusrdevname|usrdevname|state"
DiskId : 35000cca02aae9414
IState : 0
MultiPathList : |/dev/sdan||/dev/sdd|
PrevState : 0
PrevUsrDevName :
SerialNum : 001238K30BJL
SlotNum : 4
State : Online
StateChangeTs : 1372319243
StateDetails : Good
UsrDevName : HDD_E0_S04_716084244
Armed with this knowledge I wrote a little script to find out the disk that needs to be replaced based on the list of serial numbers I had. Let's call it findserial.sh
#!/bin/sh
ORACLE_HOME=/u01/app/oracle/grid
ORACLE_SID=+ASM1
PATH=$ORACLE_HOME/bin:$PATH
export ORACLE_HOME ORACLE_SID PATH
disks=`oak show disk | awk '{print $1}'`
for serial in `cat serials.txt`; do
for disk in $disks
disk_serial=`oak show $disk | grep -i serialnum | awk '{print $3}'`
echo $serial | grep "^$disk_serial$" 1>&2 >/dev/null; found_serial=$?
if [ $found_serial -eq 0 ];
echo "Found $disk:$disk_serial"
fi
done
done
On the other hand I create a text that contains the list of serial numbers.
grid@cs-oda1 # cat serials.txt
1238K2UW2L
1238K302JL
...
..
So when I ran the script the output would look something like the following:
grid@cs-oda1 $ ./findserial.sh
Found pd_10:1238K2UW2L
Found pd_18:1238K302JL
Another good feature of the OAK cli, it can be used to lit up the amber light LED on the corresponding disk
grid@cs-oda1 $ oak locate disk pd_00 on
The locate disk command can be integrated with the findserial.sh script mentioned above by adding it after the line
echo "Found $disk:$disk_serial"
oakcli locate disk $disk on
Fire it up and voila the list of disk will be lit up based on the serial numbers contained in the text file serials.txt.
Friday, June 28, 2013
Tuesday, June 18, 2013
MegaRAID
Just a little something a bud and myself experienced with the cryptic MegaRAID cli commands and if you happen to hate the MegaRAID WebBios like most people do and that would include myself.
MegaCli64 important parameters
-aX or All, where X is the adapter number 0 or 1, etc
-PhysDrv [E:S] refers to the particular physical disk, where E is the enclosure ID and S is the slot number of the physical disk
To find out what is the enclosure ID, refer to the enclosure info output
e.g.
In case there is a single LSI MegaRAID adapter, if the enclosure ID is 252 and the disk that is supposed to be replaced is Slot 5 (in actual fact it is Slot 6, the slot number starts from 0). So the format of the [E:S] is [252:5].
- To view adapter info, this will tell you how many LSI MegaRAID adapters are there installed
MegaCli64 -AdpAllInfo -aAll
- To view the enclosure info
MegaCli -EncInfo -aALL
- To view the list of all virtual disks that is connected to the LSI MegaRAID adapters
MegaCli64 -LDInfo -Lall -aALL
- To view the list of all physical disks that is connected to the LSI MegaRAID adapters
MegaCli64 -PDList -aALL
- To view the information of a particular physical disk
MegaCli64 -PDInfo -PhysDrv [E:S] -aALL
- To set the state of a particular disk to offline
MegaCli64 -PDOffline -PhysDrv [E:S] -aX
- To set the state of a particular disk to online
MegaCli64 -PDOnline -PhysDrv [E:S] -aX
- To mark a physical disk as missing, used specifically for disk replacement
MegaCli64 -PDMarkMissing -PhysDrv [E:S] -aX
- To prepare a physical for removal
MegaCli64 -PDPrpRmv -PhysDrv [E:S] -aX
- To replace a missing drive
MegaCli64 -PDReplaceMissing -PhysDrv [E:S] -ArrayM -rowN -aX
- To rebuild a drive
MegaCli64 -PDRbld -Start -PhysDrv [E:S] -aX
MegaCli64 -PDRbld -Stop -PhysDrv [E:S] -aX
MegaCli64 -PDRbld -ShowProg - PhysDrv [E:S] -aX
- To change the status of particular disk from bad to good, or from Unconfigured-Bad to Unconfigured-Good, this is especially useful when a disk has been removed abruptly without properly removing the disk from a particular virtual drive by abruptly pluggin the disk out of it's slot
MegaCli64 -PDMakeGood -PhysDrv [E:S] -aX
To change/replace a drive that is a part of a mirror logical disk
- In the case of the disk replacement we did in RHB, the faulty disk is on slot 5 (6th slot) on enclosure with the ID of 252 on adapter 0
1. Set the drive offline, if it is already not offline
MegaCli64 -PDOffline -PhysDrv [252:5] -a0
2. Mark the drive as missing
MegaCli64 -PDMarkMissing -PhysDrv [252:5] -a0
3. Prepare drive for removal
MegaCli64 -PDPrpRmv -PhysDrv [252:5] -a0
4. Physically remove the disk from the slot, and plug in the new disk
5. Add the new disk back to the affected virtual drive and start rebuilding.
MegaCli64 -PDReplaceMissing -PhysDrv [252:5] -Array2 -row1 -a0
MegaCli64 -PDRbld -Start -PhysDrv [252:5] -a0
6. To view the progress of the rebuild
MegaCli64 -PDRbld -ShowProg -PhysDrv [252:5] -a0
A note on step 4, how do you find out which Array number was affected by the bad disk, we can get find this out by viewing the status of all the logical disk configured on the HBA card (by running MegaCli64 -LDInfo -Lall -aALL) the status of the logical drive should have been marked as degraded.
Another note on step 4, how do you want to now which row the disk is logically at (this row number does not correspond to the slot number). You can find this out by running (MegaCli –PdGetMissing -a0, hopefully you won't have more that one disk missing/removed manually by your goodself)
To replace a drive that is the designated hotspare on a RAID-5 logical disk, well maybe next time.
Labels:
disk replacement,
lsi,
megaraid,
my techie notes,
raid
Subscribe to:
Posts (Atom)