Based on kernel version 4.16.1. Page generated on 2018-04-09 11:52 EST.
1 Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers 2 3 Version 2.2.11 for Linux 2.2.19 4 Version 2.4.11 for Linux 2.4.12 5 6 PRODUCTION RELEASE 7 8 11 October 2001 9 10 Leonard N. Zubkoff 11 Dandelion Digital 12 lnz@dandelion.com 13 14 Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com> 15 16 17 INTRODUCTION 18 19 Mylex, Inc. designs and manufactures a variety of high performance PCI RAID 20 controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont, 21 California 94555, USA and can be reached at 510.796.6100 or on the World Wide 22 Web at http://www.mylex.com. Mylex Technical Support can be reached by 23 electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at 24 510.745.7715. Contact information for offices in Europe and Japan is available 25 on their Web site. 26 27 The latest information on Linux support for DAC960 PCI RAID Controllers, as 28 well as the most recent release of this driver, will always be available from 29 my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960 30 driver supports all current Mylex PCI RAID controllers including the new 31 eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely 32 new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250, 33 and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as 34 minimum firmware version requirements. For simplicity, in most places this 35 documentation refers to DAC960 generically rather than explicitly listing all 36 the supported models. 37 38 Driver bug reports should be sent via electronic mail to "lnz@dandelion.com". 39 Please include with the bug report the complete configuration messages reported 40 by the driver at startup, along with any subsequent system messages relevant to 41 the controller's operation, and a detailed description of your system's 42 hardware configuration. Driver bugs are actually quite rare; if you encounter 43 problems with disks being marked offline, for example, please contact Mylex 44 Technical Support as the problem is related to the hardware configuration 45 rather than the Linux driver. 46 47 Please consult the RAID controller documentation for detailed information 48 regarding installation and configuration of the controllers. This document 49 primarily provides information specific to the Linux support. 50 51 52 DRIVER FEATURES 53 54 The DAC960 RAID controllers are supported solely as high performance RAID 55 controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960 56 driver operates at the block device level, the same level as the SCSI and IDE 57 drivers. Unlike other RAID controllers currently supported on Linux, the 58 DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the 59 complexity and unnecessary code that would be associated with an implementation 60 as a SCSI driver. The DAC960 driver is designed for as high a performance as 61 possible with no compromises or extra code for compatibility with lower 62 performance devices. The DAC960 driver includes extensive error logging and 63 online configuration management capabilities. Except for initial configuration 64 of the controller and adding new disk drives, most everything can be handled 65 from Linux while the system is operational. 66 67 The DAC960 driver is architected to support up to 8 controllers per system. 68 Each DAC960 parallel SCSI controller can support up to 15 disk drives per 69 channel, for a maximum of 60 drives on a four channel controller; the fibre 70 channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for 71 a total of 250 drives. The drives installed on a controller are divided into 72 one or more "Drive Groups", and then each Drive Group is subdivided further 73 into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level 74 and caching policy associated with it, and it appears to Linux as a single 75 block device. Logical Drives are further subdivided into up to 7 partitions 76 through the normal Linux and PC disk partitioning schemes. Logical Drives are 77 also known as "System Drives", and Drive Groups are also called "Packs". Both 78 terms are in use in the Mylex documentation; I have chosen to standardize on 79 the more generic "Logical Drive" and "Drive Group". 80 81 DAC960 RAID disk devices are named in the style of the obsolete Device File 82 System (DEVFS). The device corresponding to Logical Drive D on Controller C 83 is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 84 through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on 85 Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI 86 disks the device names will not change in the event of a disk drive failure. 87 The DAC960 driver is assigned major numbers 48 - 55 with one major number per 88 controller. The 8 bits of minor number are divided into 5 bits for the Logical 89 Drive and 3 bits for the partition. 90 91 92 SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS 93 94 The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID 95 PCI RAID Controllers as of the date of this document. It is recommended that 96 anyone purchasing a Mylex PCI RAID Controller not in the following table 97 contact the author beforehand to verify that it is or will be supported. 98 99 eXtremeRAID 3000 100 1 Wide Ultra-2/LVD SCSI channel 101 2 External Fibre FC-AL channels 102 233MHz StrongARM SA 110 Processor 103 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 104 32MB/64MB ECC SDRAM Memory 105 106 eXtremeRAID 2000 107 4 Wide Ultra-160 LVD SCSI channels 108 233MHz StrongARM SA 110 Processor 109 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 110 32MB/64MB ECC SDRAM Memory 111 112 AcceleRAID 352 113 2 Wide Ultra-160 LVD SCSI channels 114 100MHz Intel i960RN RISC Processor 115 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 116 32MB/64MB ECC SDRAM Memory 117 118 AcceleRAID 170 119 1 Wide Ultra-160 LVD SCSI channel 120 100MHz Intel i960RM RISC Processor 121 16MB/32MB/64MB ECC SDRAM Memory 122 123 AcceleRAID 160 (AcceleRAID 170LP) 124 1 Wide Ultra-160 LVD SCSI channel 125 100MHz Intel i960RS RISC Processor 126 Built in 16M ECC SDRAM Memory 127 PCI Low Profile Form Factor - fit for 2U height 128 129 eXtremeRAID 1100 (DAC1164P) 130 3 Wide Ultra-2/LVD SCSI channels 131 233MHz StrongARM SA 110 Processor 132 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) 133 16MB/32MB/64MB Parity SDRAM Memory with Battery Backup 134 135 AcceleRAID 250 (DAC960PTL1) 136 Uses onboard Symbios SCSI chips on certain motherboards 137 Also includes one onboard Wide Ultra-2/LVD SCSI Channel 138 66MHz Intel i960RD RISC Processor 139 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 140 141 AcceleRAID 200 (DAC960PTL0) 142 Uses onboard Symbios SCSI chips on certain motherboards 143 Includes no onboard SCSI Channels 144 66MHz Intel i960RD RISC Processor 145 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 146 147 AcceleRAID 150 (DAC960PRL) 148 Uses onboard Symbios SCSI chips on certain motherboards 149 Also includes one onboard Wide Ultra-2/LVD SCSI Channel 150 33MHz Intel i960RP RISC Processor 151 4MB Parity EDO Memory 152 153 DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels 154 66MHz Intel i960RD RISC Processor 155 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory 156 157 DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels 158 33MHz Intel i960RP RISC Processor 159 4MB/8MB ECC EDO Memory 160 161 DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels 162 Intel i960CF RISC Processor 163 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory 164 165 DAC960PD 1/2/3 Wide Fast SCSI-2 Channels 166 Intel i960CF RISC Processor 167 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory 168 169 DAC960PL 1/2/3 Wide Fast SCSI-2 Channels 170 Intel i960 RISC Processor 171 2MB/4MB/8MB/16MB/32MB DRAM Memory 172 173 DAC960P 1/2/3 Wide Fast SCSI-2 Channels 174 Intel i960 RISC Processor 175 2MB/4MB/8MB/16MB/32MB DRAM Memory 176 177 For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version 178 6.00-01 or above is required. 179 180 For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required. 181 182 For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is 183 required. 184 185 For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required. 186 187 For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version 188 3.51-0-04 or above is required (for dual Flash ROM controllers), or firmware 189 version 2.73-0-00 or above is required (for single Flash ROM controllers) 190 191 Please note that not all SCSI disk drives are suitable for use with DAC960 192 controllers, and only particular firmware versions of any given model may 193 actually function correctly. Similarly, not all motherboards have a BIOS that 194 properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150, 195 DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device. 196 If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to 197 verify compatibility. Mylex makes available a hard disk compatibility list at 198 http://www.mylex.com/support/hdcomp/hd-lists.html. 199 200 201 DRIVER INSTALLATION 202 203 This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12. 204 205 To install the DAC960 RAID driver, you may use the following commands, 206 replacing "/usr/src" with wherever you keep your Linux kernel source tree: 207 208 cd /usr/src 209 tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz) 210 mv README.DAC960 linux/Documentation 211 mv DAC960.[ch] linux/drivers/block 212 patch -p0 < DAC960.patch (if DAC960.patch is included) 213 cd linux 214 make config 215 make bzImage (or zImage) 216 217 Then install "arch/x86/boot/bzImage" or "arch/x86/boot/zImage" as your 218 standard kernel, run lilo if appropriate, and reboot. 219 220 To create the necessary devices in /dev, the "make_rd" script included in 221 "DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used. 222 LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive 223 are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with 224 statically linked executables of LILO and FDISK. This modified version of LILO 225 will allow booting from a DAC960 controller and/or mounting the root file 226 system from a DAC960. 227 228 Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID 229 controllers. Installing directly onto a DAC960 may be problematic from other 230 Linux distributions until their installation utilities are updated. 231 232 233 INSTALLATION NOTES 234 235 Before installing Linux or adding DAC960 logical drives to an existing Linux 236 system, the controller must first be configured to provide one or more logical 237 drives using the BIOS Configuration Utility or DACCF. Please note that since 238 there are only at most 6 usable partitions on each logical drive, systems 239 requiring more partitions should subdivide a drive group into multiple logical 240 drives, each of which can have up to 6 usable partitions. Also, note that with 241 large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63) 242 rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do 243 will cause the logical drive geometry to have more than 65535 cylinders which 244 will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry 245 can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M 246 during the BIOS initialization sequence. 247 248 For maximum performance and the most efficient E2FSCK performance, it is 249 recommended that EXT2 file systems be built with a 4KB block size and 16 block 250 stride to match the DAC960 controller's 64KB default stripe size. The command 251 "mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a 252 large number of small files on the file systems, it is also beneficial to add 253 the "-i 16384" option to increase the bytes per inode parameter thereby 254 reducing the file system metadata. Finally, on systems that will only be run 255 with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks 256 with the "-s 1" option. 257 258 259 DAC960 ANNOUNCEMENTS MAILING LIST 260 261 The DAC960 Announcements Mailing List provides a forum for informing Linux 262 users of new driver releases and other announcements regarding Linux support 263 for DAC960 PCI RAID Controllers. To join the mailing list, send a message to 264 "dac960-announce-request@dandelion.com" with the line "subscribe" in the 265 message body. 266 267 268 CONTROLLER CONFIGURATION AND STATUS MONITORING 269 270 The DAC960 RAID controllers running firmware 4.06 or above include a Background 271 Initialization facility so that system downtime is minimized both for initial 272 installation and subsequent configuration of additional storage. The BIOS 273 Configuration Utility (accessible via Alt-R during the BIOS initialization 274 sequence) is used to quickly configure the controller, and then the logical 275 drives that have been created are available for immediate use even while they 276 are still being initialized by the controller. The primary need for online 277 configuration and status monitoring is then to avoid system downtime when disk 278 drives fail and must be replaced. Mylex's online monitoring and configuration 279 utilities are being ported to Linux and will become available at some point in 280 the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure) 281 enclosure, the controller is able to rebuild failed drives automatically as 282 soon as a drive replacement is made available. 283 284 The primary interfaces for controller configuration and status monitoring are 285 special files created in the /proc/rd/... hierarchy along with the normal 286 system console logging mechanism. Whenever the system is operating, the DAC960 287 driver queries each controller for status information every 10 seconds, and 288 checks for additional conditions every 60 seconds. The initial status of each 289 controller is always available for controller N in /proc/rd/cN/initial_status, 290 and the current status as of the last status monitoring query is available in 291 /proc/rd/cN/current_status. In addition, status changes are also logged by the 292 driver to the system console and will appear in the log files maintained by 293 syslog. The progress of asynchronous rebuild or consistency check operations 294 is also available in /proc/rd/cN/current_status, and progress messages are 295 logged to the system console at most every 60 seconds. 296 297 Starting with the 2.2.3/2.0.3 versions of the driver, the status information 298 available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been 299 augmented to include the vendor, model, revision, and serial number (if 300 available) for each physical device found connected to the controller: 301 302 ***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 ***** 303 Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 304 Configuring Mylex DAC960PRL PCI RAID Controller 305 Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB 306 PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned 307 PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21 308 Controller Queue Depth: 128, Maximum Blocks per Command: 128 309 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 310 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 311 SAF-TE Enclosure Management Enabled 312 Physical Devices: 313 0:0 Vendor: IBM Model: DRVS09D Revision: 0270 314 Serial Number: 68016775HA 315 Disk Status: Online, 17928192 blocks 316 0:1 Vendor: IBM Model: DRVS09D Revision: 0270 317 Serial Number: 68004E53HA 318 Disk Status: Online, 17928192 blocks 319 0:2 Vendor: IBM Model: DRVS09D Revision: 0270 320 Serial Number: 13013935HA 321 Disk Status: Online, 17928192 blocks 322 0:3 Vendor: IBM Model: DRVS09D Revision: 0270 323 Serial Number: 13016897HA 324 Disk Status: Online, 17928192 blocks 325 0:4 Vendor: IBM Model: DRVS09D Revision: 0270 326 Serial Number: 68019905HA 327 Disk Status: Online, 17928192 blocks 328 0:5 Vendor: IBM Model: DRVS09D Revision: 0270 329 Serial Number: 68012753HA 330 Disk Status: Online, 17928192 blocks 331 0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61 332 Logical Drives: 333 /dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru 334 No Rebuild or Consistency Check in Progress 335 336 To simplify the monitoring process for custom software, the special file 337 /proc/rd/status returns "OK" when all DAC960 controllers in the system are 338 operating normally and no failures have occurred, or "ALERT" if any logical 339 drives are offline or critical or any non-standby physical drives are dead. 340 341 Configuration commands for controller N are available via the special file 342 /proc/rd/cN/user_command. A human readable command can be written to this 343 special file to initiate a configuration operation, and the results of the 344 operation can then be read back from the special file in addition to being 345 logged to the system console. The shell command sequence 346 347 echo "<configuration-command>" > /proc/rd/c0/user_command 348 cat /proc/rd/c0/user_command 349 350 is typically used to execute configuration commands. The configuration 351 commands are: 352 353 flush-cache 354 355 The "flush-cache" command flushes the controller's cache. The system 356 automatically flushes the cache at shutdown or if the driver module is 357 unloaded, so this command is only needed to be certain a write back cache 358 is flushed to disk before the system is powered off by a command to a UPS. 359 Note that the flush-cache command also stops an asynchronous rebuild or 360 consistency check, so it should not be used except when the system is being 361 halted. 362 363 kill <channel>:<target-id> 364 365 The "kill" command marks the physical drive <channel>:<target-id> as DEAD. 366 This command is provided primarily for testing, and should not be used 367 during normal system operation. 368 369 make-online <channel>:<target-id> 370 371 The "make-online" command changes the physical drive <channel>:<target-id> 372 from status DEAD to status ONLINE. In cases where multiple physical drives 373 have been killed simultaneously, this command may be used to bring all but 374 one of them back online, after which a rebuild to the final drive is 375 necessary. 376 377 Warning: make-online should only be used on a dead physical drive that is 378 an active part of a drive group, never on a standby drive. The command 379 should never be used on a dead drive that is part of a critical logical 380 drive; rebuild should be used if only a single drive is dead. 381 382 make-standby <channel>:<target-id> 383 384 The "make-standby" command changes physical drive <channel>:<target-id> 385 from status DEAD to status STANDBY. It should only be used in cases where 386 a dead drive was replaced after an automatic rebuild was performed onto a 387 standby drive. It cannot be used to add a standby drive to the controller 388 configuration if one was not created initially; the BIOS Configuration 389 Utility must be used for that currently. 390 391 rebuild <channel>:<target-id> 392 393 The "rebuild" command initiates an asynchronous rebuild onto physical drive 394 <channel>:<target-id>. It should only be used when a dead drive has been 395 replaced. 396 397 check-consistency <logical-drive-number> 398 399 The "check-consistency" command initiates an asynchronous consistency check 400 of <logical-drive-number> with automatic restoration. It can be used 401 whenever it is desired to verify the consistency of the redundancy 402 information. 403 404 cancel-rebuild 405 cancel-consistency-check 406 407 The "cancel-rebuild" and "cancel-consistency-check" commands cancel any 408 rebuild or consistency check operations previously initiated. 409 410 411 EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE 412 413 The following annotated logs demonstrate the controller configuration and and 414 online status monitoring capabilities of the Linux DAC960 Driver. The test 415 configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a 416 DAC960PJ controller. The physical drives are configured into a single drive 417 group without a standby drive, and the drive group has been configured into two 418 logical drives, one RAID-5 and one RAID-6. Note that these logs are from an 419 earlier version of the driver and the messages have changed somewhat with newer 420 releases, but the functionality remains similar. First, here is the current 421 status of the RAID configuration: 422 423 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 424 ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 425 Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 426 Configuring Mylex DAC960PJ PCI RAID Controller 427 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 428 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 429 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 430 Controller Queue Depth: 128, Maximum Blocks per Command: 128 431 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 432 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 433 Physical Devices: 434 0:1 - Disk: Online, 2201600 blocks 435 0:2 - Disk: Online, 2201600 blocks 436 0:3 - Disk: Online, 2201600 blocks 437 1:1 - Disk: Online, 2201600 blocks 438 1:2 - Disk: Online, 2201600 blocks 439 1:3 - Disk: Online, 2201600 blocks 440 Logical Drives: 441 /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru 442 /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru 443 No Rebuild or Consistency Check in Progress 444 445 gwynedd:/u/lnz# cat /proc/rd/status 446 OK 447 448 The above messages indicate that everything is healthy, and /proc/rd/status 449 returns "OK" indicating that there are no problems with any DAC960 controller 450 in the system. For demonstration purposes, while I/O is active Physical Drive 451 1:1 is now disconnected, simulating a drive failure. The failure is noted by 452 the driver within 10 seconds of the controller's having detected it, and the 453 driver logs the following console status messages indicating that Logical 454 Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD: 455 456 DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 457 DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 458 DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command 459 DAC960#0: Physical Drive 1:1 is now DEAD 460 DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL 461 DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL 462 463 The Sense Keys logged here are just Check Condition / Unit Attention conditions 464 arising from a SCSI bus reset that is forced by the controller during its error 465 recovery procedures. Concurrently with the above, the driver status available 466 from /proc/rd also reflects the drive failure. The status message in 467 /proc/rd/status has changed from "OK" to "ALERT": 468 469 gwynedd:/u/lnz# cat /proc/rd/status 470 ALERT 471 472 and /proc/rd/c0/current_status has been updated: 473 474 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 475 ... 476 Physical Devices: 477 0:1 - Disk: Online, 2201600 blocks 478 0:2 - Disk: Online, 2201600 blocks 479 0:3 - Disk: Online, 2201600 blocks 480 1:1 - Disk: Dead, 2201600 blocks 481 1:2 - Disk: Online, 2201600 blocks 482 1:3 - Disk: Online, 2201600 blocks 483 Logical Drives: 484 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 485 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 486 No Rebuild or Consistency Check in Progress 487 488 Since there are no standby drives configured, the system can continue to access 489 the logical drives in a performance degraded mode until the failed drive is 490 replaced and a rebuild operation completed to restore the redundancy of the 491 logical drives. Once Physical Drive 1:1 is replaced with a properly 492 functioning drive, or if the physical drive was killed without having failed 493 (e.g., due to electrical problems on the SCSI bus), the user can instruct the 494 controller to initiate a rebuild operation onto the newly replaced drive: 495 496 gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command 497 gwynedd:/u/lnz# cat /proc/rd/c0/user_command 498 Rebuild of Physical Drive 1:1 Initiated 499 500 The echo command instructs the controller to initiate an asynchronous rebuild 501 operation onto Physical Drive 1:1, and the status message that results from the 502 operation is then available for reading from /proc/rd/c0/user_command, as well 503 as being logged to the console by the driver. 504 505 Within 10 seconds of this command the driver logs the initiation of the 506 asynchronous rebuild operation: 507 508 DAC960#0: Rebuild of Physical Drive 1:1 Initiated 509 DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 510 DAC960#0: Physical Drive 1:1 is now WRITE-ONLY 511 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed 512 513 and /proc/rd/c0/current_status is updated: 514 515 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 516 ... 517 Physical Devices: 518 0:1 - Disk: Online, 2201600 blocks 519 0:2 - Disk: Online, 2201600 blocks 520 0:3 - Disk: Online, 2201600 blocks 521 1:1 - Disk: Write-Only, 2201600 blocks 522 1:2 - Disk: Online, 2201600 blocks 523 1:3 - Disk: Online, 2201600 blocks 524 Logical Drives: 525 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 526 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 527 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed 528 529 As the rebuild progresses, the current status in /proc/rd/c0/current_status is 530 updated every 10 seconds: 531 532 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 533 ... 534 Physical Devices: 535 0:1 - Disk: Online, 2201600 blocks 536 0:2 - Disk: Online, 2201600 blocks 537 0:3 - Disk: Online, 2201600 blocks 538 1:1 - Disk: Write-Only, 2201600 blocks 539 1:2 - Disk: Online, 2201600 blocks 540 1:3 - Disk: Online, 2201600 blocks 541 Logical Drives: 542 /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru 543 /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru 544 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed 545 546 and every minute a progress message is logged to the console by the driver: 547 548 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed 549 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed 550 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed 551 DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed 552 553 Finally, the rebuild completes successfully. The driver logs the status of the 554 logical and physical drives and the rebuild completion: 555 556 DAC960#0: Rebuild Completed Successfully 557 DAC960#0: Physical Drive 1:1 is now ONLINE 558 DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE 559 DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE 560 561 /proc/rd/c0/current_status is updated: 562 563 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 564 ... 565 Physical Devices: 566 0:1 - Disk: Online, 2201600 blocks 567 0:2 - Disk: Online, 2201600 blocks 568 0:3 - Disk: Online, 2201600 blocks 569 1:1 - Disk: Online, 2201600 blocks 570 1:2 - Disk: Online, 2201600 blocks 571 1:3 - Disk: Online, 2201600 blocks 572 Logical Drives: 573 /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru 574 /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru 575 Rebuild Completed Successfully 576 577 and /proc/rd/status indicates that everything is healthy once again: 578 579 gwynedd:/u/lnz# cat /proc/rd/status 580 OK 581 582 583 EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE 584 585 The following annotated logs demonstrate the controller configuration and and 586 online status monitoring capabilities of the Linux DAC960 Driver. The test 587 configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a 588 DAC960PJ controller. The physical drives are configured into a single drive 589 group with a standby drive, and the drive group has been configured into two 590 logical drives, one RAID-5 and one RAID-6. Note that these logs are from an 591 earlier version of the driver and the messages have changed somewhat with newer 592 releases, but the functionality remains similar. First, here is the current 593 status of the RAID configuration: 594 595 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 596 ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 597 Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 598 Configuring Mylex DAC960PJ PCI RAID Controller 599 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 600 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 601 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 602 Controller Queue Depth: 128, Maximum Blocks per Command: 128 603 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 604 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 605 Physical Devices: 606 0:1 - Disk: Online, 2201600 blocks 607 0:2 - Disk: Online, 2201600 blocks 608 0:3 - Disk: Online, 2201600 blocks 609 1:1 - Disk: Online, 2201600 blocks 610 1:2 - Disk: Online, 2201600 blocks 611 1:3 - Disk: Standby, 2201600 blocks 612 Logical Drives: 613 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 614 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 615 No Rebuild or Consistency Check in Progress 616 617 gwynedd:/u/lnz# cat /proc/rd/status 618 OK 619 620 The above messages indicate that everything is healthy, and /proc/rd/status 621 returns "OK" indicating that there are no problems with any DAC960 controller 622 in the system. For demonstration purposes, while I/O is active Physical Drive 623 1:2 is now disconnected, simulating a drive failure. The failure is noted by 624 the driver within 10 seconds of the controller's having detected it, and the 625 driver logs the following console status messages: 626 627 DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 628 DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 629 DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command 630 DAC960#0: Physical Drive 1:2 is now DEAD 631 DAC960#0: Physical Drive 1:2 killed because it was removed 632 DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL 633 DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL 634 635 Since a standby drive is configured, the controller automatically begins 636 rebuilding onto the standby drive: 637 638 DAC960#0: Physical Drive 1:3 is now WRITE-ONLY 639 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed 640 641 Concurrently with the above, the driver status available from /proc/rd also 642 reflects the drive failure and automatic rebuild. The status message in 643 /proc/rd/status has changed from "OK" to "ALERT": 644 645 gwynedd:/u/lnz# cat /proc/rd/status 646 ALERT 647 648 and /proc/rd/c0/current_status has been updated: 649 650 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 651 ... 652 Physical Devices: 653 0:1 - Disk: Online, 2201600 blocks 654 0:2 - Disk: Online, 2201600 blocks 655 0:3 - Disk: Online, 2201600 blocks 656 1:1 - Disk: Online, 2201600 blocks 657 1:2 - Disk: Dead, 2201600 blocks 658 1:3 - Disk: Write-Only, 2201600 blocks 659 Logical Drives: 660 /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru 661 /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru 662 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed 663 664 As the rebuild progresses, the current status in /proc/rd/c0/current_status is 665 updated every 10 seconds: 666 667 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 668 ... 669 Physical Devices: 670 0:1 - Disk: Online, 2201600 blocks 671 0:2 - Disk: Online, 2201600 blocks 672 0:3 - Disk: Online, 2201600 blocks 673 1:1 - Disk: Online, 2201600 blocks 674 1:2 - Disk: Dead, 2201600 blocks 675 1:3 - Disk: Write-Only, 2201600 blocks 676 Logical Drives: 677 /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru 678 /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru 679 Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed 680 681 and every minute a progress message is logged on the console by the driver: 682 683 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed 684 DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed 685 DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed 686 DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed 687 688 Finally, the rebuild completes successfully. The driver logs the status of the 689 logical and physical drives and the rebuild completion: 690 691 DAC960#0: Rebuild Completed Successfully 692 DAC960#0: Physical Drive 1:3 is now ONLINE 693 DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE 694 DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE 695 696 /proc/rd/c0/current_status is updated: 697 698 ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** 699 Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> 700 Configuring Mylex DAC960PJ PCI RAID Controller 701 Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB 702 PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned 703 PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 704 Controller Queue Depth: 128, Maximum Blocks per Command: 128 705 Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 706 Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 707 Physical Devices: 708 0:1 - Disk: Online, 2201600 blocks 709 0:2 - Disk: Online, 2201600 blocks 710 0:3 - Disk: Online, 2201600 blocks 711 1:1 - Disk: Online, 2201600 blocks 712 1:2 - Disk: Dead, 2201600 blocks 713 1:3 - Disk: Online, 2201600 blocks 714 Logical Drives: 715 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 716 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 717 Rebuild Completed Successfully 718 719 and /proc/rd/status indicates that everything is healthy once again: 720 721 gwynedd:/u/lnz# cat /proc/rd/status 722 OK 723 724 Note that the absence of a viable standby drive does not create an "ALERT" 725 status. Once dead Physical Drive 1:2 has been replaced, the controller must be 726 told that this has occurred and that the newly replaced drive should become the 727 new standby drive: 728 729 gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command 730 gwynedd:/u/lnz# cat /proc/rd/c0/user_command 731 Make Standby of Physical Drive 1:2 Succeeded 732 733 The echo command instructs the controller to make Physical Drive 1:2 into a 734 standby drive, and the status message that results from the operation is then 735 available for reading from /proc/rd/c0/user_command, as well as being logged to 736 the console by the driver. Within 60 seconds of this command the driver logs: 737 738 DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 739 DAC960#0: Physical Drive 1:2 is now STANDBY 740 DAC960#0: Make Standby of Physical Drive 1:2 Succeeded 741 742 and /proc/rd/c0/current_status is updated: 743 744 gwynedd:/u/lnz# cat /proc/rd/c0/current_status 745 ... 746 Physical Devices: 747 0:1 - Disk: Online, 2201600 blocks 748 0:2 - Disk: Online, 2201600 blocks 749 0:3 - Disk: Online, 2201600 blocks 750 1:1 - Disk: Online, 2201600 blocks 751 1:2 - Disk: Standby, 2201600 blocks 752 1:3 - Disk: Online, 2201600 blocks 753 Logical Drives: 754 /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru 755 /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru 756 Rebuild Completed Successfully