From: "Денис Ягофаров" <denyago@rambler.ru> To: ALT Linux sysadmin discuss <sysadmins@lists.altlinux.org> Subject: Re: [Sysadmins] Собрать raid после сбоя и перезагрузки Date: Tue, 28 Oct 2008 12:17:16 +0200 Message-ID: <4906E6AC.8020109@rambler.ru> (raw) In-Reply-To: <4905D02A.5000702@solin.spb.ru> [-- Attachment #1: Type: text/plain, Size: 10971 bytes --] Проведем эксперимент. Устроим "сбой" диска в массиве raid5, и посмотрим с каким UUID массив восстановится. Вот так оно выглядит до сбоя: # blkid /dev/sda1: UUID="4aa4c1e1-a3e5-464c-8305-a2be835250b6" SEC_TYPE="ext2" TYPE="ext3" /dev/sdc1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/sdd1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/sdb1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/md0: UUID="4aa4c1e1-a3e5-464c-8305-a2be835250b6" SEC_TYPE="ext2" TYPE="ext3" # cat /proc/mdstat Personalities : [raid10] [raid6] [raid5] [raid4] md0 : active raid5 sdd1[3] sdc1[2] sdb1[1] sda1[0] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Fri Oct 24 16:29:15 2008 Raid Level : raid5 Array Size : 2197715712 (2095.91 GiB 2250.46 GB) Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Oct 25 07:09:34 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 62985eb6:1297d375:923a4cf5:5754a253 Events : 0.2 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 active sync /dev/sdd1 Предположим, что 1 из жёстких дисков совсем погиб. При этом он уже ни на что не отвечает. # echo "scsi remove-single-device 2 0 0 0" > /proc/scsi/scsi # ls /dev/sd* /dev/sda /dev/sda1 /dev/sdb /dev/sdb1 /dev/sdd /dev/sdd1 /dev/sde в логах Oct 27 11:37:22 localhost kernel: ata3.00: disabled # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Fri Oct 24 16:29:15 2008 Raid Level : raid5 Array Size : 2197715712 (2095.91 GiB 2250.46 GB) Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Oct 25 07:09:34 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 62985eb6:1297d375:923a4cf5:5754a253 Events : 0.2 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync 3 8 49 3 active sync /dev/sdd1 Как ни странно, он ещё не failed. Стоит ли его вручную помечать как failed? # cat /proc/mdstat Personalities : [raid10] [raid6] [raid5] [raid4] md0 : active raid5 sdd1[3] sdc1[2] sdb1[1] sda1[0] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] Пробуем отформатировать, к примеру, /dev/md0... впрочем иное действие чтения/записи тоже должно подойти. Personalities : [raid10] [raid6] [raid5] [raid4] md0 : active raid5 sdd1[3] sdc1[4](F) sdb1[1] sda1[0] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U] в логах Oct 27 11:40:21 localhost kernel: scsi 2:0:0:0: rejecting I/O to dead device Oct 27 11:40:21 localhost mdmonitor: Fail event on /dev/md0 Oct 27 11:40:21 localhost kernel: scsi 2:0:0:0: rejecting I/O to dead device Oct 27 11:40:21 localhost kernel: scsi 2:0:0:0: rejecting I/O to dead device Oct 27 11:40:21 localhost kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 3 devices Oct 27 11:40:21 localhost kernel: RAID5 conf printout: Oct 27 11:40:21 localhost kernel: --- rd:4 wd:3 fd:1 Oct 27 11:40:21 localhost kernel: disk 0, o:1, dev:sda1 Oct 27 11:40:21 localhost kernel: disk 1, o:1, dev:sdb1 Oct 27 11:40:21 localhost kernel: disk 2, o:0, dev:sdc1 Oct 27 11:40:21 localhost kernel: disk 3, o:1, dev:sdd1 Oct 27 11:40:21 localhost kernel: RAID5 conf printout: Oct 27 11:40:21 localhost kernel: --- rd:4 wd:3 fd:1 Oct 27 11:40:21 localhost kernel: disk 0, o:1, dev:sda1 Oct 27 11:40:21 localhost kernel: disk 1, o:1, dev:sdb1 Oct 27 11:40:21 localhost kernel: disk 3, o:1, dev:sdd1 Если, нужно отключить погибший, но ещё крутящийся жёсткий диск, можно ли как-то на него подать сигнал к полному останову и отключению питания? Или обо всём позаботится backplain? Мы "прибежали" с новым жёстким диском. Создали раздел (а может, он уже заранее был создан?) и подключили его... # echo "scsi add-single-device 2 0 0 0" > /proc/scsi/scsi в логах Oct 27 11:42:59 localhost kernel: ata3: soft resetting port Oct 27 11:42:59 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 27 11:42:59 localhost kernel: ata3.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 31/32) Oct 27 11:42:59 localhost kernel: ata3.00: configured for UDMA/133 Oct 27 11:42:59 localhost kernel: ata3: EH complete Oct 27 11:42:59 localhost kernel: Vendor: ATA Model: ST3750640AS Rev: 3.AA Oct 27 11:42:59 localhost kernel: Type: Direct-Access ANSI SCSI revision: 05 Oct 27 11:42:59 localhost kernel: SCSI device sdf: 1465149168 512-byte hdwr sectors (750156 MB) Oct 27 11:42:59 localhost kernel: sdf: Write Protect is off Oct 27 11:42:59 localhost kernel: SCSI device sdf: drive cache: write back Oct 27 11:42:59 localhost kernel: SCSI device sdf: 1465149168 512-byte hdwr sectors (750156 MB) Oct 27 11:42:59 localhost kernel: sdf: Write Protect is off Oct 27 11:42:59 localhost kernel: SCSI device sdf: drive cache: write back Oct 27 11:42:59 localhost kernel: sdf: sdf1 Oct 27 11:42:59 localhost kernel: sd 2:0:0:0: Attached scsi disk sdf Правильно, его нету уже, может, это стоило сделать раньше? # mdadm /dev/md0 --remove /dev/sdc1 mdadm: cannot find /dev/sdc1: No such file or directory Добавляем новый диск... # mdadm /dev/md0 --add /dev/sdf1 mdadm: re-added /dev/sdf1 в логах Oct 27 11:44:30 localhost kernel: md: bind<sdf1> Oct 27 11:44:30 localhost kernel: RAID5 conf printout: Oct 27 11:44:30 localhost kernel: --- rd:4 wd:3 fd:1 Oct 27 11:44:30 localhost kernel: disk 0, o:1, dev:sda1 Oct 27 11:44:30 localhost kernel: disk 1, o:1, dev:sdb1 Oct 27 11:44:30 localhost kernel: disk 2, o:1, dev:sdf1 Oct 27 11:44:30 localhost kernel: disk 3, o:1, dev:sdd1 Oct 27 11:44:30 localhost kernel: md: recovery of RAID array md0 Oct 27 11:44:30 localhost kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 27 11:44:30 localhost kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Oct 27 11:44:30 localhost kernel: md: using 128k window, over a total of 732571904 blocks. Oct 27 11:44:30 localhost mdmonitor: RebuildStarted event on /dev/md0 Ждём 700 минут.... (ок. 12 часов) # cat /proc/mdstat Personalities : [raid10] [raid6] [raid5] [raid4] md0 : active raid5 sdf1[2] sdd1[3] sdc1[4](F) sdb1[1] sda1[0] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 0.1% (1136640/732571904) finish=717.3min speed=16992K/sec После восстановления: # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Fri Oct 24 16:29:15 2008 Raid Level : raid5 Array Size : 2197715712 (2095.91 GiB 2250.46 GB) Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Oct 27 22:53:14 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 62985eb6:1297d375:923a4cf5:5754a253 Events : 0.8 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 81 2 active sync /dev/sdf1 3 8 49 3 active sync /dev/sdd1 4 8 33 - faulty spare Как видим, UUID не поменялся.... добавляем его в конфиг: DEVICE /dev/sd[a,b,c,d]1 ARRAY /dev/md0 UUID=62985eb6:1297d375:923a4cf5:5754a253 MAILADDR root PROGRAM /sbin/mdadm-syslog-events Пробуем перезагрузиться... Как видим, всё восстановилось: # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Fri Oct 24 16:29:15 2008 Raid Level : raid5 Array Size : 2197715712 (2095.91 GiB 2250.46 GB) Used Dev Size : 732571904 (698.64 GiB 750.15 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Oct 27 22:53:14 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 62985eb6:1297d375:923a4cf5:5754a253 Events : 0.8 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 active sync /dev/sdd1 # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] # blkid /dev/sda1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/sdd1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/sdb1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/sdc1: UUID="b65e9862-75d3-9712-f54c-3a9253a25457" TYPE="mdraid" /dev/md0: UUID="6201837c-d0db-4e0e-a7ae-d3490c47cc46" SEC_TYPE="ext2" TYPE="ext3" Вывод: Не путайте UUID : 62985eb6:1297d375:923a4cf5:5754a253 из mdadm -D /dev/md0 и /dev/md0: UUID="6201837c-d0db-4e0e-a7ae-d3490c47cc46" SEC_TYPE="ext2" TYPE="ext3" из blkid, как это сделал я. [-- Attachment #2: denyago.vcf --] [-- Type: text/x-vcard, Size: 281 bytes --] begin:vcard fn:Denis Timurovich Yagofarov n:Yagofarov;Denis Timurovich org:ITGIS NASU adr:room 615;;Chokolovski blvdr., 13;Kiev;;03151;Ukraine email;internet:denyago@rambler.ru title:system administrator tel;work:80442480755 x-mozilla-html:FALSE version:2.1 end:vcard
next prev parent reply other threads:[~2008-10-28 10:17 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2008-10-24 8:33 Денис Ягофаров 2008-10-24 8:51 ` Денис Ягофаров 2008-10-24 10:16 ` Alexey Shabalin 2008-10-24 10:56 ` Денис Ягофаров 2008-10-24 14:15 ` Денис Ягофаров 2008-10-25 16:53 ` Konstantin A. Lepikhov 2008-10-25 20:35 ` Maks Re 2008-10-26 9:22 ` Ivan Fedorov 2008-10-26 22:38 ` Aleksey Avdeev 2008-10-27 8:22 ` Yuri Bushmelev 2008-10-27 12:18 ` Ivan Fedorov 2008-10-27 14:28 ` Aleksey Avdeev 2008-10-28 10:17 ` Денис Ягофаров [this message] 2008-10-28 11:11 ` Aleksey Avdeev 2008-10-29 11:14 ` Sergey Vlasov 2008-10-29 12:19 ` Aleksey Avdeev 2008-10-29 11:20 ` Sergey Vlasov 2008-10-29 11:44 ` Maks Re
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4906E6AC.8020109@rambler.ru \ --to=denyago@rambler.ru \ --cc=sysadmins@lists.altlinux.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
ALT Linux sysadmins discussion This inbox may be cloned and mirrored by anyone: git clone --mirror http://lore.altlinux.org/sysadmins/0 sysadmins/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 sysadmins sysadmins/ http://lore.altlinux.org/sysadmins \ sysadmins@lists.altlinux.org sysadmins@lists.altlinux.ru sysadmins@lists.altlinux.com public-inbox-index sysadmins Example config snippet for mirrors. Newsgroup available over NNTP: nntp://lore.altlinux.org/org.altlinux.lists.sysadmins AGPL code for this site: git clone https://public-inbox.org/public-inbox.git