* [Hardware] SMART errors
@ 2008-03-23 9:11 Vladimir Karpinsky
2008-03-23 9:19 ` Andrey Rahmatullin
2008-03-23 14:52 ` Michael Shigorin
0 siblings, 2 replies; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-03-23 9:11 UTC (permalink / raw)
To: Hardware
[-- Attachment #1: Type: text/plain, Size: 544 bytes --]
Установил АЛД на новые диске, обнаружил в логе дисковые ошибки, смотрю
вывод SMART'а (см. влож.). Ну с sdc похоже вообще всё понятно: там уже
перемещённые секторы есть, но меня удивляют/настораживают какие-то
бешенные цифры в Raw_Read_Error_Rate и Seek_Error_Rate у всех дисков. И
это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите, пожалуйста, с
чем это может быть связано: может ли это являться каким-то артефактом
или нужно нести бегом в гарантию не только sdc, но и все остальные?
С уважением,
[-- Attachment #2: smart_sdd.log --]
[-- Type: text/plain, Size: 4370 bytes --]
smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device Model: ST3500630NS
Serial Number: 9QG58L17
Firmware Version: 3.AEK
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Mar 22 08:54:19 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 154242194
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 9
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 10054340
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 216
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 9
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 057 053 045 Old_age Always - 791150635
194 Temperature_Celsius 0x0022 043 047 000 Old_age Always - 43 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered 0x001a 061 060 000 Old_age Always - 178601827
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 208 -
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[-- Attachment #3: smart_sda.log --]
[-- Type: text/plain, Size: 4370 bytes --]
smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device Model: ST3500630NS
Serial Number: 9QG54Z9N
Firmware Version: 3.AEK
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Mar 22 08:54:05 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 111 100 006 Pre-fail Always - 155132431
3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 8
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 070 060 030 Pre-fail Always - 11288649
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 216
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 8
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 057 053 045 Old_age Always - 790102059
194 Temperature_Celsius 0x0022 043 047 000 Old_age Always - 43 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered 0x001a 062 060 000 Old_age Always - 199192249
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 208 -
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[-- Attachment #4: smart_sdb.log --]
[-- Type: text/plain, Size: 4447 bytes --]
smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device Model: ST3500630NS
Serial Number: 9QG5DAQR
Firmware Version: 3.AEK
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Mar 22 08:54:12 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 112 099 006 Pre-fail Always - 168967211
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 9
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 6932095
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 88
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 9
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 059 055 045 Old_age Always - 756875305
194 Temperature_Celsius 0x0022 041 045 000 Old_age Always - 41 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered 0x001a 066 058 000 Old_age Always - 171976256
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 81 -
# 2 Extended offline Aborted by host 70% 79 -
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[-- Attachment #5: smart_sdc.log --]
[-- Type: text/plain, Size: 9077 bytes --]
smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device Model: ST3500630NS
Serial Number: 9QG5CVLW
Firmware Version: 3.AEK
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Mar 22 08:53:19 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 224749369
3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 8
5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 142
7 Seek_Error_Rate 0x000f 070 060 030 Pre-fail Always - 12252717
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 216
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 8
187 Unknown_Attribute 0x0032 001 001 000 Old_age Always - 126
189 Unknown_Attribute 0x003a 092 092 000 Old_age Always - 8
190 Temperature_Celsius 0x0022 062 060 045 Old_age Always - 672661542
194 Temperature_Celsius 0x0022 038 040 000 Old_age Always - 38 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered 0x001a 064 060 000 Old_age Always - 5907451
197 Current_Pending_Sector 0x0012 088 087 000 Old_age Always - 264
198 Offline_Uncorrectable 0x0010 088 087 000 Old_age Offline - 264
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 127 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 127 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 00 02 34 c8 e0 Error: UNC at LBA = 0x00c83402 = 13120514
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 78 5b 33 c8 e0 00 00:18:16.998 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:16.997 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:16.996 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:13.485 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:13.484 READ DMA EXT
Error 126 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 00 02 34 c8 e0 Error: UNC at LBA = 0x00c83402 = 13120514
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 78 5b 33 c8 e0 00 00:18:16.998 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:16.997 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:16.996 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:13.485 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:13.484 READ DMA EXT
Error 125 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 00 02 34 c8 e0 Error: UNC at LBA = 0x00c83402 = 13120514
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 78 5b 33 c8 e0 00 00:18:16.998 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:16.997 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:16.996 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:13.485 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:13.484 READ DMA EXT
Error 124 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 00 02 34 c8 e0 Error: UNC at LBA = 0x00c83402 = 13120514
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 78 5b 33 c8 e0 00 00:18:16.998 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:16.997 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:16.996 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:13.485 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:13.484 READ DMA EXT
Error 123 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 00 02 34 c8 e0 Error: UNC at LBA = 0x00c83402 = 13120514
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 78 5b 33 c8 e0 00 00:18:16.998 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:18:16.997 IDENTIFY DEVICE
25 00 78 5b 33 c8 e0 00 00:18:16.996 READ DMA EXT
ca 00 18 e8 ff 0f e0 00 00:18:13.485 WRITE DMA
ec 00 00 00 00 00 a0 00 00:18:13.484 IDENTIFY DEVICE
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 216 935867394
# 2 Extended offline Completed: read failure 90% 206 935867394
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-03-23 9:11 [Hardware] SMART errors Vladimir Karpinsky
@ 2008-03-23 9:19 ` Andrey Rahmatullin
2008-03-23 9:23 ` Vladimir Karpinsky
2008-03-23 14:52 ` Michael Shigorin
1 sibling, 1 reply; 9+ messages in thread
From: Andrey Rahmatullin @ 2008-03-23 9:19 UTC (permalink / raw)
To: hardware
[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]
On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
> меня удивляют/настораживают какие-то бешенные цифры в
> Raw_Read_Error_Rate и Seek_Error_Rate у всех дисков.
Багофича сигейтов, не обращайте внимания.
WBR, wRAR (ALT Linux Team)
Powered by the ALT Linux fortune(8):
<roman> "Так и вижу: мальчик в красном галстуке выходит из здания. На здании
табличка: "Дом Юного Линуксоида". :-)"
<Lost> roman: в красной шапочке - это юный Рхеловец
<Lost> в красной федоре - юный Федорец
<Lost> согнутый в красную букву "Зю" - старый Дебианец
<sadeness_> слакварщик - с красными глазами
<Lost> гентушник с красными глазами
<swi> и безпристанно смеющийся альтовец..
<dottedmag> слакварщик - со знаменем с патрегом
<Lost> альтовцы с ними не пойдут - мы техноснобы :)
<dottedmag> Lost: ну раз техноснобы - значит сзади, на БТР-е.
<Lost> dottedmag: угу, на БТРе с крыльями, реактивным двигателем и старым
паровым движком, модернизированным по самое нехочу
<dottedmag> Lost: паровой движок - это слишком современно. С автоматическими
вёслами из титанового сплава.
<Lost> реактивный двигатель дает тягу на гусеничные траки, а паровой - на
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-03-23 9:11 [Hardware] SMART errors Vladimir Karpinsky
2008-03-23 9:19 ` Andrey Rahmatullin
@ 2008-03-23 14:52 ` Michael Shigorin
2008-03-23 17:37 ` Vladimir Karpinsky
1 sibling, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-03-23 14:52 UTC (permalink / raw)
To: Hardware
On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
> это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите,
> пожалуйста, с чем это может быть связано: может ли это являться
> каким-то артефактом или нужно нести бегом в гарантию не только
> sdc, но и все остальные?
Всё подозрительное лучше постараться поменять сразу.
На кластерах новые диски проверяются в т.ч. зеркалами --
если RAID1 разваливается при загрузке, на замену без слов.
PS: про сигейты я ldv@ как-то предупреждал, он не поверил,
потом сам поймал ST3750640NS (два из четырёх):
---- WBR, Michael Shigorin <mike@altlinux.ru>
------ Linux.Kiev http://www.linux.kiev.ua/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-03-23 14:52 ` Michael Shigorin
@ 2008-03-23 17:37 ` Vladimir Karpinsky
2008-03-23 18:49 ` Michael Shigorin
0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-03-23 17:37 UTC (permalink / raw)
To: hardware, shigorin
Michael Shigorin пишет:
> On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
>> это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите,
>> пожалуйста, с чем это может быть связано: может ли это являться
>> каким-то артефактом или нужно нести бегом в гарантию не только
>> sdc, но и все остальные?
> Всё подозрительное лучше постараться поменять сразу.
> На кластерах новые диски проверяются в т.ч. зеркалами --
> если RAID1 разваливается при загрузке, на замену без слов.
А чем лучше проверять RAID 1 и 5? Badblocks или есть что-то специально
под RAID заточенное? Один диск там явно плохой, а, вот, остальные менять
пока не за что, им надо устроить серьёзные тесты прежде чем вводить в
> PS: про сигейты я ldv@ как-то предупреждал, он не поверил,
> потом сам поймал ST3750640NS (два из четырёх):
> http://lists.altlinux.org/pipermail/hardware/2007-April/010325.html
> http://lists.altlinux.org/pipermail/hardware/2007-May/010429.html
Как-то я это проглядел, а то не за что бы сигейты не взял :-(.
С уважением,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-03-23 17:37 ` Vladimir Karpinsky
@ 2008-03-23 18:49 ` Michael Shigorin
2008-04-02 13:26 ` Vladimir Karpinsky
0 siblings, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-03-23 18:49 UTC (permalink / raw)
To: hardware
On Sun, Mar 23, 2008 at 08:37:38PM +0300, Vladimir Karpinsky wrote:
> >Всё подозрительное лучше постараться поменять сразу.
> >На кластерах новые диски проверяются в т.ч. зеркалами --
> >если RAID1 разваливается при загрузке, на замену без слов.
> А чем лучше проверять RAID 1 и 5?
watch cat /proc/mdstat ;-)
> Один диск там явно плохой, а, вот, остальные менять пока не за
> что, им надо устроить серьёзные тесты прежде чем вводить в
> эксплуатацию.
bonnie++ по файловой.
---- WBR, Michael Shigorin <mike@altlinux.ru>
------ Linux.Kiev http://www.linux.kiev.ua/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-03-23 18:49 ` Michael Shigorin
@ 2008-04-02 13:26 ` Vladimir Karpinsky
2008-04-02 13:32 ` Michael Shigorin
0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-04-02 13:26 UTC (permalink / raw)
To: hardware, shigorin
Michael Shigorin пишет:
>> Один диск там явно плохой, а, вот, остальные менять пока не за
>> что, им надо устроить серьёзные тесты прежде чем вводить в
>> эксплуатацию.
> bonnie++ по файловой.
Я что-то не понял, как его запускать: он не хочет запускаться от root'а,
требует -u user. А от пользователя ему не хватает прав на запись,
например, в /.
С уважением,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-04-02 13:26 ` Vladimir Karpinsky
@ 2008-04-02 13:32 ` Michael Shigorin
2008-04-02 14:12 ` Vladimir Karpinsky
0 siblings, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-04-02 13:32 UTC (permalink / raw)
To: hardware
On Wed, Apr 02, 2008 at 05:26:55PM +0400, Vladimir Karpinsky wrote:
> >>Один диск там явно плохой, а, вот, остальные менять пока не
> >>за что, им надо устроить серьёзные тесты прежде чем вводить в
> >>эксплуатацию.
> >bonnie++ по файловой.
> Я что-то не понял, как его запускать: он не хочет запускаться
> от root'а, требует -u user. А от пользователя ему не хватает
> прав на запись, например, в /.
Каталог сделать и пользователю вручить :)
---- WBR, Michael Shigorin <mike@altlinux.ru>
------ Linux.Kiev http://www.linux.kiev.ua/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Hardware] SMART errors
2008-04-02 13:32 ` Michael Shigorin
@ 2008-04-02 14:12 ` Vladimir Karpinsky
0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-04-02 14:12 UTC (permalink / raw)
To: hardware, shigorin
Michael Shigorin пишет:
> On Wed, Apr 02, 2008 at 05:26:55PM +0400, Vladimir Karpinsky wrote:
>>>> Один диск там явно плохой, а, вот, остальные менять пока не за что, им надо устроить серьёзные тесты прежде чем вводить в эксплуатацию.
>>> bonnie++ по файловой.
>> Я что-то не понял, как его запускать: он не хочет запускаться от root'а, требует -u user. А от пользователя ему не хватает прав на запись,
>> например, в /.
> Каталог сделать и пользователю вручить :)
А-а, т.е. он не по всему диску ползает, а только в специально отведённой резервации работает!
Как-то я это не уловил из мана. Спасибо!
Формат вывода у него не очень внятно описан, разберись тут:
Version 1.03 ------Sequential Output------ --Sequential Input---Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
newseismix 6G 47842 90 78127 19 36784 13 52300 92 137918 26 183.8 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
newseismix,6G,47842,90,78127,19,36784,13,52300,92,137918,26,183.8,0,16,+++++,+++,+++++,+++ ,+++++,+++,+++++,+++,+++++,+++,+++++,+++
Я не понимаю, что значат эти плюсики, почему иногда (при изменении ram size) вместо некоторых
из них появляются циферки (смысл циферок вроде понятен) и наоборот. Помогите, пожалуйста,
А есть какой-то эмпирический алгоритм использования bonnie++ именно для нагрузочного теста?
Или погонял, погонял а потом в SMART полез смотреть.
С уважением,
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-02 14:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-23 9:11 [Hardware] SMART errors Vladimir Karpinsky
2008-03-23 9:19 ` Andrey Rahmatullin
2008-03-23 9:23 ` Vladimir Karpinsky
2008-03-23 14:52 ` Michael Shigorin
2008-03-23 17:37 ` Vladimir Karpinsky
2008-03-23 18:49 ` Michael Shigorin
2008-04-02 13:26 ` Vladimir Karpinsky
2008-04-02 13:32 ` Michael Shigorin
2008-04-02 14:12 ` Vladimir Karpinsky
ALT Linux hardware support
This inbox may be cloned and mirrored by anyone:
git clone --mirror http://lore.altlinux.org/hardware/0 hardware/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 hardware hardware/ http://lore.altlinux.org/hardware \
hardware@altlinux.ru hardware@lists.altlinux.org hardware@lists.altlinux.ru hardware@lists.altlinux.com hardware@altlinux.org
public-inbox-index hardware
Example config snippet for mirrors.
Newsgroup available over NNTP:
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git