R/Evolution 2000 Series Troubleshooting Manual

Hide thumbs Also See for 2000 Series:

Table of Contents

Quick Links

Download this manual

2000 Series

Troubleshooting Guide

P/N 83-00004287-12

Revision A

May 2008

Table of Contents

Troubleshooting

Summary of Contents for R/Evolution 2000 Series

Page 1 2000 Series Troubleshooting Guide P/N 83-00004287-12 Revision A May 2008...
Page 2 Copyright Protected Material 2002-2008. All rights reserved. R/Evolution and the R/Evolution logo are trademarks of Dot Hill Systems Corp. All other trademarks and registered trademarks are proprietary to their respective owners. The material in this document is for information only and is subject to change without notice. While reasonable efforts have been made in the preparation of this document to assure its accuracy, changes in the product design can be made without reservation and without notification to its users.
Page 3: Table Of Contents
Contents Preface ............9 System Architecture .
Page 4 ........54 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 5 Resetting Expander Error Counters ........55 Disabling or Enabling a PHY .
Page 6 ........100 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 7 Updating Disk Drive Firmware ........101 Removing and Replacing a Drive Module .
Page 8 ............127 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 9: Preface
Preface This guide describes how to diagnose and troubleshoot a R/Evolution™ storage system, and how to identify, remove, and replace field-replaceable units (FRUs). It also describes critical, warning, and informational events that can occur during system operation. This guide applies to the following enclosures: 2730 FC Controller Enclosure ■...
Page 10: Typographic Conventions
83-00004289 Using the command-line interface R/Evolution 2000 Series CLI Reference Guide 83-00004288 (CLI) Recommendations for maximizing R/Evolution 2000 Series Best Practices Guide 83-00004286 reliability, accessibility, and (FC and iSCSI only) serviceability 10 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 11: System Architecture
CH A P T E R System Architecture This chapter describes the R/Evolution™ storage system architecture. Prior to troubleshooting any system, it is important to understand the architecture, including each of the system components, how they relate to each other, and how data passes through the system.
Page 12: Enclosure Chassis And Midplane
FRUs plug into this board. Drive modules plug into the front of the midplane. Power-and-cooling modules and I/O modules (controller modules or drive modules) plug into the back of the midplane. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 13: Enclosure Id Display
Enclosure ID Display The enclosure ID (EID) display provides a visual single-digit identifier for each enclosure in a storage system. The EID display is located on the left ear, as viewed from the front of the chassis. For a storage system that includes a controller module, EID values are set by the RAID controller.
Page 14: Drive Modules
In addition, each enclosure can be populated with disks of various capacities. To ensure the full use of a disk’s capacity, construct all virtual disks with disks of the same capacity. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 15: Controller Modules
Controller Modules A controller module is a FRU that contains two connected circuit boards: a RAID I/O module and a host interface module (HIM). The RAID I/O module is a hot-pluggable board that mates with the enclosure midplane and provides all RAID controller functions and SAS/SATA disk channels. The HIM provides the host-side interface and contains dual-port, host target channels for connection to host systems.
Page 16: Power Supply Unit
This chamber is commonly evacuated by all of the fans. In this way the amount of mass flow through each drive slot is controlled to be the same slot to slot. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 17: Airflow
Airflow is controlled and optimized over the power supply by using the power supply chassis as the air-duct for the power supply, ensuring that there are no dead air spaces in the power supply core and increasing the velocity flow (LFM) by controlling the cross sectional area that the mass flow travels through.
Page 18 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 19: Fault Isolation Methodology
CH A P T E R Fault Isolation Methodology The R/Evolution storage system provides many ways to isolate faults within the system. This chapter presents the basic methodology used to locate faults and the associated FRUs. The basic fault isolation steps are: Gather fault information ■...
Page 20: Review The Event Logs
SFP, cable, switch, or data host. For more information about isolating faults, see “Troubleshooting Using System LEDs” on page 21. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 21: Troubleshooting Using System Leds
CH A P T E R Troubleshooting Using System LEDs The first step in troubleshooting your storage system is to check the status of its LEDs. System LEDs can help you identify the FRU that is experiencing a fault. This chapter includes the following topics: “LED Names and Locations”...
Page 22 LINK Port 0 Port 1 DIRTY CLEAN CACHE ACTIVITY 10/100 BASE-T STATUS Cache status Expansion FRU OK port status Fault/Service Required OK to Remove Unit Locator Figure 3-4 2530 Controller Module LEDs R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 23: Using Leds To Check System Status
Service SAS In SAS Out port status port status FRU OK Unit Locator Fault/Service Required OK to Remove Figure 3-5 Expansion Module LEDs AC Power Good DC Voltage/Fan Fault/Service Required Figure 3-6 Power-and-Cooling Module LEDs Using LEDs to Check System Status Check the enclosure status LEDs periodically or after you have received an error notification.
Page 24: Using Enclosure Status Leds
However, if the drive has failed and the failure is such that the controller cannot communicate with the drive, this LED is off. Caution – Do not remove a drive that is rebuilding. Removing a drive may terminate the current operation and cause data loss. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 25: Using Controller Module Host Port Leds
Using Controller Module Host Port LEDs During normal operation, when a controller module host port is connected to a data host, the port’s host link status LED and host link activity LED are green. For FC, if the link speed is set to 2 Gbit/sec the host link speed LED is off; for 4 Gbit/sec, it is green.
Page 26 No – The controller module’s port has failed. Replace the controller module. ■ Yes – Monitor the connection for a period of time. It may be an intermittent ■ problem, which can occur with SFPs, damaged cables, and HBAs. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 27 Isolating a Host-Side Connection Fault on a SAS Storage System During normal operation, when a controller module host port is connected to a data host, the port’s host link status LED and host link activity LED are green. If there is I/O activity, the host activity LED blinks green.
Page 28 No – The controller module’s port has failed. Replace the controller module. ■ Yes – Monitor the connection for a period of time. It may be an intermittent ■ problem, which can occur with damaged cables and HBAs. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 29 Isolating a Host-Side Connection Fault on an iSCSI Storage System This procedure requires scheduled downtime. Note – Do not perform more than one step at a time. Changing more than one variable at a time can complicate the troubleshooting process. 1.
Page 30: Using The Controller Module Expansion Port Led
Yes – Monitor the status to ensure there is no intermittent error present. If the ■ fault occurs again, clean the connections to ensure that a dirty connector is not interfering with the data path. No – Proceed to Step 4. ■ R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 31: Using Ethernet Management Port Leds
4. Move the expansion cable to a port on the RAID enclosure with a known good link status. This step isolates the problem to the expansion cable or to the controller module’s expansion port. Is the expansion port status LED on? Yes –...
Page 32: Using Controller Module Status Leds
Hardware-controlled power-up error ■ Cache flush error ■ Cache self-refresh error ■ If the OK to Remove LED is blue, the controller module is prepared for removal. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 33: Using Power-And-Cooling Module Leds
Using Power-and-Cooling Module LEDs During normal operation, the AC Power Good LED is green. If the AC Power Good LED is off, the module is not receiving adequate power. Verify that the power cord is properly connected and check the power source it is connected to.
Page 34 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 35: Troubleshooting Using Raidar
CH A P T E R Troubleshooting Using RAIDar This chapter describes how to use RAIDar to troubleshoot your storage system and its FRUs. It also describes solutions to problems you might experience when using RAIDar. Topics covered in this chapter include: “Problems Using RAIDar to Access a Storage System”...
Page 36: Problems Using Raidar To Access A Storage System
(#). No password is required because the local host is expected to be secure. 3. Use the create user command to create new users. For information about using the command, enter help create user or see the CLI reference guide. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 37: Determining Storage System Status And Verifying Faults
Determining Storage System Status and Verifying Faults The System Summary page shows you the overall status of the storage system. To view storage system status: 1. Select Monitor > Status > Status Summary. 2. Check the status icon at the upper left corner of each panel. A green icon indicates that components associated with that panel are ■...
Page 38: Stopping I/O
To use the Overall Rate Stats page to ensure that all I/O has ceased on a remote system: 1. Quiesce host applications that access the storage system. 2. Select Monitor > Statistics > Overall Rate Stats. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 39: Clearing Metadata From Leftover Disk Drives
3. Click your browser’s refresh button to ensure that current data is displayed. 4. In the Host-Generated I/O & Bandwidth Totals for All Virtual Disks panel, verify that both indicators display 0 (no activity). Clearing Metadata From Leftover Disk Drives A drive becomes a “leftover”...
Page 40: Isolating Faulty Disk Drives
3. Click Update LED Illumination. The lower LED on the selected drive starts blinking yellow. For more information about viewing drive information, see the reference guide. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 41: Reviewing Disk Drive Error Statistics
Reviewing Disk Drive Error Statistics The Disk Error Stats page provides specific drive fault information. It shows a graphical representation of the enclosures and disks installed in the system. The Disk Error Stats page can be used to gather drive information and to identify specific drive errors.
Page 42 3. To view the error statistics, select the suspected drive and click Show Disk Drive Error Statistics. 4. Review the Disk Drive Error Statistics panel for drive errors. The Disk Drive Error Statistics panel enables you to review errors from each of the two ports. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 43: Reviewing The Event Logs
Reviewing the Event Logs If all the steps in “Identifying a Faulty Disk Drive” on page 40 and “Reviewing Disk Drive Error Statistics” on page 41 have been performed, you have determined the following: A disk drive has encountered a fault ■...
Page 44 Reconstruction can take hours or days to complete, depending on the virtual disk RAID level and size, drive speed, utility priority, and other processes running on the storage system. You can stop reconstruction only by deleting the virtual disk. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 45: Isolating Data Path Faults
Isolating Data Path Faults When isolating data path faults, you must first isolate the fault to an internal data path or an external data path. This will help to target your troubleshooting efforts. Internal data paths include the following: Controller to disk connectivity ■...
Page 46 Phy Detail panel. This panel shows information about each PHY in the internal data paths between the Storage Controller, Expander Controller, drives, and expansion ports. By reviewing this page you can quickly locate the internal data path that has a fault. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 47 Checking PHY Status RAIDar's Expander Status page includes an Expander Controller PHY Detail panel. This panel shows the internal data paths that show the data paths for the Storage Controller, Expander Controller, disks, and expansion ports. Review this page to locate an internal data path that has a fault.
Page 48 PHY, not including those received during Link Reset sequences. A running disparity error occurs when positive and negative values in a signal don't alternate. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 49 CRC Error Count – In a sequence of SAS transfers (frames), the data is ■ protected by a cyclic redundancy check (CRC) value. This error count specifies the number of times the computed CRC does not match the CRC stored in the frame, which indicates that the frame might have been corrupted in transit.
Page 50 4. Periodically examine the Expander Status page to see if the fault isolation firmware disables the same PHY again. If it does: a. Replace the appropriate cable. b. Reset the affected controller or power-cycle the enclosure. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 51: Isolating External Data Path Faults On An Fc Storage System
Isolating External Data Path Faults on an FC Storage System To troubleshoot external data path faults, perform the following steps: 1. Select Monitor > Status > Advanced Settings > Host Port Status. This page provides a graphical representation of controller host port status and port details.
Page 52: Isolating External Data Path Faults On An Iscsi Storage System
IP Address – Port IP address ■ IP Mask – Port IP subnet mask ■ IP Gateway – Port gateway IP address ■ Service Port – iSCSI port number ■ Hardware Address – Port MAC address ■ R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 53: Isolating External Data Path Faults On A Sas Storage System
Isolating External Data Path Faults on a SAS Storage System To troubleshoot external data path faults, perform the following steps: 1. Select Monitor > Status > Advanced Settings > Host Port Status. This page provides a graphical representation of controller host port status and port details.
Page 54: Resetting A Host Channel On An Fc Storage System
PHYs, and disable or enable PHY fault isolation. Use of the Expander Status page is described in “Checking PHY Status” on page 47 and in the reference guide. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 55: Resetting Expander Error Counters
Resetting Expander Error Counters If PHYs have errors, you can reset expander error counters and then observe error activity during normal operation. If a PHY continues to accumulate errors you can disable it in the Expander Controller Phy Detail panel. To reset expander error counters: ●...
Page 56: Using Recovery Utilities
Then, if spares of the appropriate size are available, reconstruction begins. Note – After you dequarantine the virtual disk, make sure that a spare drive is available to let the virtual disk reconstruct. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 57: Trusting A Virtual Disk For Disaster Recovery
Caution – If the virtual disk does not have enough drives to continue operation, when a dequarantine is done, the virtual disk goes offline and its data cannot be recovered. To remove a virtual disk from quarantine: 1. Select Manage > Utilities > Recovery Utilities > Vdisk Quarantine. For each virtual disk, the virtual disk panel shows a status icon;...
Page 58 If the virtual disk does not come back online, it might be that too many drives are offline or the virtual disk might have additional failures on the bus or enclosure that Trust Virtual Disk cannot fix. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 59: Problems Scheduling Tasks
Problems Scheduling Tasks If your task does not run at the times you specified, check the schedule specifications. It is possible to create conflicting specifications. Start time is the first time the task will run. ■ If you use the Between option, the starting date/time must be in the Between ■...
Page 60: Affect Of Changing The Date And Time
The following table describes error messages associated with scheduling tasks. Table 4-2 Errors Associated with Scheduling Tasks Error Message Solution Task Already Exists Select a different name for the task. Schedule Already Exists Select a different name for the schedule. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 61: Selecting Individual Events For Notification
Selecting Individual Events for Notification As described in the reference guide, you can configure how and under what conditions the storage system alerts you when specific events occur. In addition to selecting event categories, as a Diagnostic Manage user you can select individual events that you want to be notified of.
Page 62: Selecting Or Clearing All Events For Notification
2. Click Set All Individual Events. To clear all events: 1. In the Clear All Individual Events panel, select the checkbox for each notification type you don’t want to use. 2. Click Clear All Individual Events. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 63: Correcting Enclosure Ids
Correcting Enclosure IDs When installing a system with drive enclosures attached, the enclosure IDs might differ from the physical cabling order. This is because the controller might have been previously attached to some of the same enclosures and it attempts to preserve the previous enclosure IDs if possible.
Page 64 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 65: Troubleshooting Using Event Logs
CH A P T E R Troubleshooting Using Event Logs Event logs capture reported events from components throughout the storage system. Each event consists of an event code, the date and time the event occurred, which controller reported the event, and a description of what occurred. This chapter includes the following topics: “Event Severities”...
Page 66: Viewing The Event Log In Raidar
1. Do one of the following: In the System Panel, click the icon. ■ In the menu, select Monitor > Status > View Event Log. ■ The event log page is displayed. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 67 2. Click one of the following buttons in the Select Event Table To View panel to see the corresponding events. For a dual-controller system: Button Description Controller A & B Events Shows all events for both controllers. This is the default. Controller A &...
Page 68: Viewing An Event Log Saved From Raidar
Severity Level column in RAIDar. Ctrlr – A or B indicates which controller logged the event. ■ Description – Information about the event. This corresponds to the Message ■ column in RAIDar. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 69: Reviewing Event Logs
For example: Event SN Date/Time Code Severity Controller Description A29856 08-06 09:35:07 Time/date has been changed A29809 08-04 12:12:05 Uncorrectable ECC error in buffer memory address 0x0 on bootup Reviewing Event Logs When reviewing events, do the following: 1. Review the critical/warning events. Identify the primary events and any that might be the cause of the primary event.
Page 70: Saving Log Information To A File
When processing is complete, a summary page is displayed. 5. Review the summary of contact information, comments, and selected logs. 6. Click Download Selected Logs To File. 7. If prompted to open or save the file, click Save. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 71: Configuring The Debug Log
8. If prompted to specify the file location and name, do so using a extension. .logs The default file name is . If you intend to capture multiple event logs, store.logs be sure to name the files appropriately so that they can be identified later. 9.
Page 72 4. If instructed by service personnel, click Advanced Debug Logging Setup Options and select one or more additional types of events. Under normal conditions, none of these options should be selected because they have a slight impact on read/write performance. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 73: Voltage And Temperature Warnings
CH A P T E R Voltage and Temperature Warnings The storage system provides voltage and temperature warnings, which are generally input or environmental conditions. Voltage warnings can occur if the input voltage is too low or if a FRU is receiving too little or too much power from the power-and- cooling module.
Page 74: Sensor Locations
4000 to 6000 RPM. When a fan’s speed drops below 4000 RPM, the EMP considers it a failure and posts an alarm in the storage system’s event log. The following table lists the description, location, and alarm condition for each fan. If R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 75: Temperature Sensors
the fan speed remains under the 4000 RPM threshold, the internal enclosure temperature may continue to rise. Replace the power-and-cooling module reporting the fault. Table 6-2 Cooling Fan Sensor Descriptions Event/Fault ID Description Location LED Condition Fan 0 Power-and-cooling module 0 <...
Page 76 To view the controller enclosure’s temperature status, in RAIDar, as an Advanced Manage user: ● Select Monitor > Status > Advanced Settings > Temperature Status. For more information see RAIDar help or the reference guide. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 77: Power-And-Cooling Module Voltage Sensors
Power-and-Cooling Module Voltage Sensors Power supply voltage sensors ensure that an enclosure’s power supply voltage is within normal ranges. There are three voltage sensors per power-and-cooling module. Table 6-5 Voltage Sensor Descriptions Sensor Event/Fault ID LED Condition Power Supply 1 Voltage, 12V <...
Page 78 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 79: Troubleshooting And Replacing Frus
CH A P T E R Troubleshooting and Replacing FRUs This chapter describes how to troubleshoot and replace field-replaceable units. A field-replaceable unit (FRU) is a system component that is designed to be replaced onsite. This chapter contains the following sections: “Static Electricity Precautions”...
Page 80: Static Electricity Precautions
Note – When troubleshooting, ensure that you review the reported events carefully. The controller module is often the FRU reporting faults, but is not always the FRU where the fault is occurring. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 81 Table 7-1 lists the faults you might encounter with a controller module or expansion module. Table 7-1 Controller Module or Expansion Module Faults Problem Solution FRU OK LED is off • Verify that the controller module is properly seated in the slot and latched.
Page 82: Removing And Replacing A Controller Or Expansion Module
LAN configuration settings ■ Host port configuration settings ■ Enclosure management settings ■ Disk configuration settings ■ Services security settings ■ System information settings ■ System preference settings ■ Event notification settings ■ R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 83 The configuration file does not include configuration data for virtual disks and volumes. You do not need to save this data before replacing a controller or expansion module because the data is saved as metadata in the first sectors of associated disk drives.
Page 84: Shutting Down A Controller Module
4. Confirm the operation by clicking OK. Note – If the storage system is connected to a Microsoft Windows host, the following event is recorded in the Windows event log: Initiator failed to connect to the target. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 85: Removing A Controller Module Or Expansion Module
Removing a Controller Module or Expansion Module As long as the other module in the enclosure you are removing remains online and active, you can remove a module without powering down the enclosure; however you must shut down a controller module as described in “Shutting Down a Controller Module”...
Page 86 In a single-controller configuration, you must shut down the controller to prevent the virtual disks from going offline. 7. Turn the thumbscrews until the screws disengage from the module. 8. Press both latches downward to disconnect the module from the midplane. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 87: Replacing A Controller Module Or Expansion Module
9. Pull the module straight out of the enclosure. Replacing a Controller Module or Expansion Module You can install a controller module or expansion module into an enclosure that is powered on. Caution – When replacing a controller module, ensure that less than 10 seconds elapse between inserting the module into a slot and fully latching it in place.
Page 88 The FRU OK LED illuminates green when the module completes initializing and is online. If the enclosure’s Unit Locator LED is blinking, use RAIDar to stop it: 1. Select Manage > General Config > Enclosure Management. 2. Click Turn Off Locator LED. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 89: Moving A Set Of Expansion Modules
Fault/Service Required If the Fault/Service Required yellow LED is illuminated, the module has not gone online and likely failed its self-test. Try to put the module online (see “Shutting Down a Controller Module” on page 84) or check for errors that were generated in the event log from RAIDar.
Page 90: Updating Firmware
(partner firmware upgrade). If told to do so by a service technician, you can disable the partner firmware upgrade function using RAIDar. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 91: Updating Firmware Using Raidar
Disabling Partner Firmware Upgrade The partner firmware upgrade option is enabled by default in RAIDar. Only disable this function if told to do so by a service technician. 1. Select Manage > General Config > System Configuration. 2. For Partner Firmware Upgrade, select Disable. Updating Firmware Using RAIDar RAIDar enables you to upgrade the firmware in your storage system when new releases are available.
Page 92: Identifying Sfp Module Faults
SFP include intermittent errors and no link. To identify a faulty SFP, utilize the link LED and perform the troubleshooting procedure described in “Using Controller Module Host Port LEDs” on page 25. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 93: Removing And Replacing An Sfp Module
Removing and Replacing an SFP Module This section provides steps to remove and replace an SFP module. Caution – Mishandling fiber-optic cables can degrade performance. Do not twist, fold, pinch, or step on fiber-optic cables. Do not bend the fiber-optic cables tighter than a 2-inch radius.
Page 94: Installing An Sfp Module
1. If the SFP has a plug, remove it and slide the SFP into the port until it locks into place. 2. Flip the actuator down, and connect the fiber-optic interface cable into the duplex jack at the end of the SFP. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 95: Identifying Cable Faults
Identifying Cable Faults When identifying cable faults you must remember that there are two sides of the controller: the input/output to the host and the input/output to the drive enclosures. It is also important to remember that identifying a cable fault can be difficult due to the multiple components that make up the data paths that cannot be overlooked as a cause of the fault.
Page 96: Identifying Drive Module Faults
The event log includes errors reported by the enclosure management processors (EMPs) and disk drives in your storage system. If you see these errors in the event log, the following information will help you understand the errors. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 97 When a disk detects an error, it reports it to the controller by returning a SCSI sense key, and if appropriate, additional information. This information is recorded in the RAIDar event log. Table 7-2 lists some of the most common SCSI sense key descriptions (in hexadecimal).
Page 98: Disk Drive Errors
Each of these events may result in a warning or critical notification in RAIDar and the event log. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 99: Disk Channel Errors
Disk Channel Errors Disk channel errors are similar to disk-detected errors, except they are detected by the controllers instead of the disk drive. Some disk channel errors are displayed as text strings. Others are displayed as hexadecimal codes. If the error is a critical error, perform the steps in “Disk Drive Errors” on page 98. Table 7-4 lists the descriptions for disk channel errors.
Page 100: Identifying Faulty Drive Modules
Note – Step 8 requires that you schedule down time for the system. If the drive fails again the midplane may have an intermittent fault or the connector is dirty, replace the enclosure. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 101: Updating Disk Drive Firmware
Updating Disk Drive Firmware You can update disk drive firmware by loading a firmware update file obtained from the disk drive manufacturer or your reseller. Note – Updating the firmware of disk drives in a virtual disk risks the loss of data and causes the drives to be temporarily inaccessible.
Page 102 If more than two drives are listed, a Select All check box is displayed. 4. Select the disk drives to update. 5. Click Continue. 6. Click Browse to select the firmware update file. 7. Click Load Device Firmware File. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 103 8. To start the firmware update, click Start Firmware Update. To cancel the firmware update, click Cancel. The file is transferred to the controller where it is temporarily stored prior to download to the disk drives. Once the firmware update process has started, the Drive Firmware Loading Progress page provides the update progress of each disk drive, including when the firmware update completes successfully.
Page 104: Removing And Replacing A Drive Module
Wait until the rebuild process is completed, and then replace the defective drive ■ module. The benefit is that the virtual disk is fully restored before you replace the defective drive. This eliminates the possibility of lost data if the wrong drive is removed. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 105: Identifying The Location Of A Faulty Drive Module
Replace the defective drive and make the new drive a global spare while the ■ rebuilding process continues. This procedure installs the new drive and assigns it as a global spare so that an automatic rebuild can occur if a drive module fails on another virtual disk.
Page 106: Removing A Drive Module
Precautions” on page 80. 2. Squeeze the release on the left edge of the drive ejector handle. 3. Rotate the handle toward the right to disengage the drive module from the enclosure’s internal connector. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 107: Installing A Drive Module
Squeeze Drive ejector handle 4. Wait 20 seconds for the internal disks to stop spinning. 5. Pull the drive module out of the enclosure. Installing a Drive Module To install the a drive module, perform the following steps: 1. Follow all static electricity precautions as described in “Static Electricity Precautions”...
Page 108 Manage > Utilities > Recovery Utilities > Vdisk Quarantine. • Ad a new drive as a vdisk spare by selecting Manage > Virtual Disk Config > Vdisk Configuration > Add Vdisk Spares. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 109: Verify That The Correct Power-On Sequence Was Performed
Table 7-5 Disk Drive Status (Continued) Status Action Quarantined Wait for the missing drive to come online. If it doesn’t, create another vdisk and perform a restore The vdisk is offline and has been quarantined from the latest backed up copy. because some drives are missing.
Page 110: Installing An Air Management Module
You cannot stop the expansion once it is started. • If you have an immediate need, create a new virtual disk of the size you want, transfer your data to the new virtual disk, and delete the old virtual disk. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 111 Table 7-6 Virtual Disk Faults (Continued) Problem Solution Failover causes a virtual disk to • In general, controller failover is not supported if a disk drive is become critical when one of its in a drive enclosure that is connected with only one cable to the drives “disappears.”...
Page 112: Clearing Metadata From A Disk Drive
When a power supply fails, the fans of the module continue to operate because they draw power from the power bus located on the midplane. Once a fault is identified in the power-and-cooling module, you need to replace the entire module. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 113 Caution – Because removing the power-and-cooling module significantly disrupts the enclosure’s airflow, do not remove the power-and-cooling module until you have the replacement module. Table 7-7 lists possible power-and-cooling module faults. Table 7-7 Power-and-Cooling Module Faults Fault Solution Power supply fan warning or failure, or •...
Page 114: Removing And Replacing A Power-And-Cooling Module
3. Rotate the latch downward to disconnect the internal connector, and slide the module out. Do not lift the power-and-cooling module by the latch. This could break the Note – latch. Hold the power-and-cooling module by the metal casing. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 115: Installing A Power-And-Cooling Module
Thumbscrew Latch Installing a Power-and-Cooling Module To install a power-and-cooling module, perform the following steps: 1. Slide the module into the slot as far as it will go. 2. Press the latch upward to engage the module; turn the thumbscrews finger-tight. 3.
Page 116: Replacing An Enclosure
When you replace the enclosure, you need to reset the IP address as described in the getting started guide. Caution – If connected data hosts are not inactive during this replacement procedure, data loss could occur. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 117: Troubleshooting Using The Cli
A P P E N D I X Troubleshooting Using the CLI This appendix briefly describes CLI commands that are useful for troubleshooting storage system problems. For detailed information about command syntax and using the CLI, see the CLI reference guide. Topics covered in this appendix include: “Viewing Command Help”...
Page 118: Viewing Command Help
ERROR as shown by the command. show expander-status For details about using , see the CLI reference guide. clear expander-status R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 119: Ping
ping Tests communication with a remote host. The remote host is specified by IP address. Ping sends ICMP echo response packets and waits for replies. For details about using , see the CLI reference guide. ping rescan When installing a system with drive enclosures attached, the enclosure IDs might not agree with the physical cabling order.
Page 120: Restore Defaults
Sets the types of debug messages to include in the Storage Controller debug log. If multiple types are specified, use spaces to separate them and enclose the list in quotation marks ( " For details about using , see the CLI reference set debug-log-parameters guide. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 121: Set Expander-Fault-Isolation
set expander-fault-isolation When fault isolation is enabled, the Expander Controller will isolate PHYs that fail to meet certain criteria. When fault isolation is disabled, the errors are noted in the logs but the PHYs are not isolated. For details about using , see the CLI set expander-fault-isolation reference guide.
Page 122: Show Debug-Log
Shows the status of system enclosures and their components. For each attached enclosure, the command shows general SCSI Enclosure Services (SES) information followed by component-specific information. For details about using , see the CLI reference guide. show enclosure-status R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 123: Show Events
show events Shows events for an enclosure, including events from each Management Controller and each Storage Controller. A separate set of event numbers is maintained for each controller module. Each event number is prefixed with a letter identifying the controller module that logged the event. If SNMP is configured, events can be sent to SNMP traps.
Page 124: Show Redundancy-Mode
Using a trusted virtual disk is only a disaster-recovery measure; the virtual disk has no tolerance for any additional failures. For details about using , see the CLI reference guide. trust R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 125: Problems Scheduling Tasks
Problems Scheduling Tasks There are two parts to scheduling tasks: you must create the task and then create the schedule to run the task. Create the Task There are three tasks you can create: , and TakeSnapshot ResetSnapshot VolumeCopy Perform the operation directly to ensure the command syntax is correct. For example, if you want to schedule taking a snapshot, first issue a command to take the snapshot and verify that it runs.
Page 126: Errors Associated With Scheduling Tasks
For example, this problem would occur if you tried to create a virtual disk named without specifying the parameter. assigned-to To use a name that the CLI could interpret as an optional parameter, you must specify that parameter before the name parameter. R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 127: Index
Index selecting to monitor, 61 critical state, virtual disk air management module, installing, 110 preventing, 56 architecture, system overview, 11 data paths bad block isolating faults, 45 list size, displaying, 42 debug log, 71 reassignments, displaying, 42 setting up, 71 boot handshake, 89 viewing, 122 debug log parameters...
Page 128: R/Evolution 2000 Series Troubleshooting Guide • May
12 configuring notification, 61 types, 65 events, showing, 123 host channel link, resetting, 119 expander fault isolation, enabling or disabling, 121 host channels, resetting, 54 expander PHYs, enabling or disabling, 121 R/Evolution 2000 Series Troubleshooting Guide • May 2008...
Page 129 rescan disks, 46 physical layer interface. See PHY, 45 checking status, 38 pinging a remote host, 119 displaying timeout count, 41 power-and-cooling module icons, system status, 37 architecture, 16 identifying faults, 112 informational events, 65 installing, 115 enabling, 65 removing, 114 selecting to monitor, 61 replacing, 114, 115 installing...
Page 130 124 virtual disk reconstructing, 43 trusting an offline, 124 virtual disks dequarantining, 57 disaster recovery, 57 identifying faults, 110 preventing critical state, 56 redundant reconstructing, 43 voltage warnings, resolving, 73 R/Evolution 2000 Series Troubleshooting Guide • May 2008...

R/Evolution 2000 Series Troubleshooting Manual

Preface

1 System Architecture

2 Fault Isolation Methodology

3 Troubleshooting Using System Leds

4 Troubleshooting Using Raidar

5 Troubleshooting Using Event Logs

6 Voltage and Temperature Warnings

7 Troubleshooting and Replacing Frus

Troubleshooting Using the CLI

Index

Quick Links

Troubleshooting

Related Manuals for R/Evolution 2000 Series

Summary of Contents for R/Evolution 2000 Series

Table of Contents