ALT Linux hardware support
 help / color / mirror / Atom feed
* [Hardware] SMART errors
@ 2008-03-23  9:11 Vladimir Karpinsky
  2008-03-23  9:19 ` Andrey Rahmatullin
  2008-03-23 14:52 ` Michael Shigorin
  0 siblings, 2 replies; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-03-23  9:11 UTC (permalink / raw)
  To: Hardware

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

Здравствуйте!

Установил АЛД на новые диске, обнаружил в логе дисковые ошибки, смотрю 
вывод SMART'а (см. влож.).  Ну с sdc похоже вообще всё понятно: там уже 
перемещённые секторы есть, но меня удивляют/настораживают какие-то 
бешенные цифры в Raw_Read_Error_Rate и Seek_Error_Rate у всех дисков. И 
это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите, пожалуйста, с 
чем это может быть связано: может ли это являться каким-то артефактом 
или нужно нести бегом в гарантию не только sdc, но и все остальные?

-- 
	С уважением,
		Владимир.

[-- Attachment #2: smart_sdd.log --]
[-- Type: text/plain, Size: 4370 bytes --]

smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500630NS
Serial Number:    9QG58L17
Firmware Version: 3.AEK
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Mar 22 08:54:19 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail  Always       -       154242194
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       9
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail  Always       -       10054340
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       216
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       9
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   057   053   045    Old_age   Always       -       791150635
194 Temperature_Celsius     0x0022   043   047   000    Old_age   Always       -       43 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   061   060   000    Old_age   Always       -       178601827
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       208         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


[-- Attachment #3: smart_sda.log --]
[-- Type: text/plain, Size: 4370 bytes --]

smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500630NS
Serial Number:    9QG54Z9N
Firmware Version: 3.AEK
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Mar 22 08:54:05 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   100   006    Pre-fail  Always       -       155132431
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       11288649
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       216
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   057   053   045    Old_age   Always       -       790102059
194 Temperature_Celsius     0x0022   043   047   000    Old_age   Always       -       43 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   062   060   000    Old_age   Always       -       199192249
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       208         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


[-- Attachment #4: smart_sdb.log --]
[-- Type: text/plain, Size: 4447 bytes --]

smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500630NS
Serial Number:    9QG5DAQR
Firmware Version: 3.AEK
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Mar 22 08:54:12 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   112   099   006    Pre-fail  Always       -       168967211
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       9
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always       -       6932095
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       88
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       9
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   059   055   045    Old_age   Always       -       756875305
194 Temperature_Celsius     0x0022   041   045   000    Old_age   Always       -       41 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   066   058   000    Old_age   Always       -       171976256
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        81         -
# 2  Extended offline    Aborted by host               70%        79         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


[-- Attachment #5: smart_sdc.log --]
[-- Type: text/plain, Size: 9077 bytes --]

smartctl version 5.36 [i586-alt-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500630NS
Serial Number:    9QG5CVLW
Firmware Version: 3.AEK
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Mar 22 08:53:19 2008 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       224749369
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       142
  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       12252717
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       216
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
187 Unknown_Attribute       0x0032   001   001   000    Old_age   Always       -       126
189 Unknown_Attribute       0x003a   092   092   000    Old_age   Always       -       8
190 Temperature_Celsius     0x0022   062   060   045    Old_age   Always       -       672661542
194 Temperature_Celsius     0x0022   038   040   000    Old_age   Always       -       38 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   064   060   000    Old_age   Always       -       5907451
197 Current_Pending_Sector  0x0012   088   087   000    Old_age   Always       -       264
198 Offline_Uncorrectable   0x0010   088   087   000    Old_age   Offline      -       264
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 127 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 127 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 02 34 c8 e0  Error: UNC at LBA = 0x00c83402 = 13120514

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 78 5b 33 c8 e0 00      00:18:16.998  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:16.997  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:16.996  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:13.485  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:13.484  READ DMA EXT

Error 126 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 02 34 c8 e0  Error: UNC at LBA = 0x00c83402 = 13120514

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 78 5b 33 c8 e0 00      00:18:16.998  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:16.997  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:16.996  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:13.485  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:13.484  READ DMA EXT

Error 125 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 02 34 c8 e0  Error: UNC at LBA = 0x00c83402 = 13120514

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 78 5b 33 c8 e0 00      00:18:16.998  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:16.997  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:16.996  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:13.485  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:13.484  READ DMA EXT

Error 124 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 02 34 c8 e0  Error: UNC at LBA = 0x00c83402 = 13120514

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 78 5b 33 c8 e0 00      00:18:16.998  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:16.997  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:16.996  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:13.485  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:13.484  READ DMA EXT

Error 123 occurred at disk power-on lifetime: 186 hours (7 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 02 34 c8 e0  Error: UNC at LBA = 0x00c83402 = 13120514

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 78 5b 33 c8 e0 00      00:18:16.998  READ DMA EXT
  ec 00 00 00 00 00 a0 00      00:18:16.997  IDENTIFY DEVICE
  25 00 78 5b 33 c8 e0 00      00:18:16.996  READ DMA EXT
  ca 00 18 e8 ff 0f e0 00      00:18:13.485  WRITE DMA
  ec 00 00 00 00 00 a0 00      00:18:13.484  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       216         935867394
# 2  Extended offline    Completed: read failure       90%       206         935867394

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23  9:11 [Hardware] SMART errors Vladimir Karpinsky
@ 2008-03-23  9:19 ` Andrey Rahmatullin
  2008-03-23  9:23   ` Vladimir Karpinsky
  2008-03-23 14:52 ` Michael Shigorin
  1 sibling, 1 reply; 9+ messages in thread
From: Andrey Rahmatullin @ 2008-03-23  9:19 UTC (permalink / raw)
  To: hardware

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
> меня удивляют/настораживают какие-то бешенные цифры в
> Raw_Read_Error_Rate и Seek_Error_Rate у всех дисков.
Багофича сигейтов, не обращайте внимания.

-- 
WBR, wRAR (ALT Linux Team)
Powered by the ALT Linux fortune(8):

<roman> "Так и вижу: мальчик в красном галстуке выходит из здания. На здании
        табличка: "Дом Юного Линуксоида". :-)"
<Lost> roman: в красной шапочке - это юный Рхеловец
<Lost> в красной федоре - юный Федорец
<Lost> согнутый в красную букву "Зю" - старый Дебианец
<sadeness_> слакварщик - с красными глазами
<Lost> гентушник с красными глазами
<swi> и безпристанно смеющийся альтовец..
<dottedmag> слакварщик - со знаменем с патрегом
<Lost> альтовцы с ними не пойдут - мы техноснобы :)
<dottedmag> Lost: ну раз техноснобы - значит сзади, на БТР-е.
<Lost> dottedmag: угу, на БТРе с крыльями, реактивным двигателем и старым
       паровым движком, модернизированным по самое нехочу
<dottedmag> Lost: паровой движок - это слишком современно. С автоматическими
       вёслами из титанового сплава.
<Lost> реактивный двигатель дает тягу на гусеничные траки, а паровой - на
       крылья

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23  9:19 ` Andrey Rahmatullin
@ 2008-03-23  9:23   ` Vladimir Karpinsky
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-03-23  9:23 UTC (permalink / raw)
  To: hardware

Andrey Rahmatullin пишет:
> On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
>> меня удивляют/настораживают какие-то бешенные цифры в
>> Raw_Read_Error_Rate и Seek_Error_Rate у всех дисков.
> Багофича сигейтов, не обращайте внимания.

Спасибо за хорошую новость!

-- 
	С уважением,
		Владимир.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23  9:11 [Hardware] SMART errors Vladimir Karpinsky
  2008-03-23  9:19 ` Andrey Rahmatullin
@ 2008-03-23 14:52 ` Michael Shigorin
  2008-03-23 17:37   ` Vladimir Karpinsky
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-03-23 14:52 UTC (permalink / raw)
  To: Hardware

On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
> это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите,
> пожалуйста, с чем это может быть связано: может ли это являться
> каким-то артефактом или нужно нести бегом в гарантию не только
> sdc, но и все остальные?

Всё подозрительное лучше постараться поменять сразу.
На кластерах новые диски проверяются в т.ч. зеркалами -- 
если RAID1 разваливается при загрузке, на замену без слов.

PS: про сигейты я ldv@ как-то предупреждал, он не поверил,
потом сам поймал ST3750640NS (два из четырёх):

http://lists.altlinux.org/pipermail/hardware/2007-April/010325.html
http://lists.altlinux.org/pipermail/hardware/2007-May/010429.html

-- 
 ---- WBR, Michael Shigorin <mike@altlinux.ru>
  ------ Linux.Kiev http://www.linux.kiev.ua/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23 14:52 ` Michael Shigorin
@ 2008-03-23 17:37   ` Vladimir Karpinsky
  2008-03-23 18:49     ` Michael Shigorin
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-03-23 17:37 UTC (permalink / raw)
  To: hardware, shigorin

Здравствуйте!

Michael Shigorin пишет:
> On Sun, Mar 23, 2008 at 12:11:57PM +0300, Vladimir Karpinsky wrote:
>> это всё у НОВЫХ дисков СЕРВЕРНОГО исполнения! Подскажите,
>> пожалуйста, с чем это может быть связано: может ли это являться
>> каким-то артефактом или нужно нести бегом в гарантию не только
>> sdc, но и все остальные?
> 
> Всё подозрительное лучше постараться поменять сразу.
> На кластерах новые диски проверяются в т.ч. зеркалами -- 
> если RAID1 разваливается при загрузке, на замену без слов.

А чем лучше проверять RAID 1 и 5? Badblocks или есть что-то специально 
под RAID заточенное? Один диск там явно плохой, а, вот, остальные менять 
пока не за что, им надо устроить серьёзные тесты прежде чем вводить в 
эксплуатацию.

> PS: про сигейты я ldv@ как-то предупреждал, он не поверил,
> потом сам поймал ST3750640NS (два из четырёх):
> 
> http://lists.altlinux.org/pipermail/hardware/2007-April/010325.html
> http://lists.altlinux.org/pipermail/hardware/2007-May/010429.html

Как-то я это проглядел, а то не за что бы сигейты не взял :-(.

-- 
	С уважением,
		Владимир.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23 17:37   ` Vladimir Karpinsky
@ 2008-03-23 18:49     ` Michael Shigorin
  2008-04-02 13:26       ` Vladimir Karpinsky
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-03-23 18:49 UTC (permalink / raw)
  To: hardware

On Sun, Mar 23, 2008 at 08:37:38PM +0300, Vladimir Karpinsky wrote:
> >Всё подозрительное лучше постараться поменять сразу.
> >На кластерах новые диски проверяются в т.ч. зеркалами -- 
> >если RAID1 разваливается при загрузке, на замену без слов.
> А чем лучше проверять RAID 1 и 5?

watch cat /proc/mdstat ;-)

> Один диск там явно плохой, а, вот, остальные менять пока не за
> что, им надо устроить серьёзные тесты прежде чем вводить в
> эксплуатацию.

bonnie++ по файловой.

-- 
 ---- WBR, Michael Shigorin <mike@altlinux.ru>
  ------ Linux.Kiev http://www.linux.kiev.ua/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-03-23 18:49     ` Michael Shigorin
@ 2008-04-02 13:26       ` Vladimir Karpinsky
  2008-04-02 13:32         ` Michael Shigorin
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-04-02 13:26 UTC (permalink / raw)
  To: hardware, shigorin

Здравствуйте!

Michael Shigorin пишет:
>> Один диск там явно плохой, а, вот, остальные менять пока не за
>> что, им надо устроить серьёзные тесты прежде чем вводить в
>> эксплуатацию.
> 
> bonnie++ по файловой.
> 
Я что-то не понял, как его запускать: он не хочет запускаться от root'а, 
требует -u user. А от пользователя ему не хватает прав на запись, 
например, в /.


-- 
	С уважением,
		Владимир.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-04-02 13:26       ` Vladimir Karpinsky
@ 2008-04-02 13:32         ` Michael Shigorin
  2008-04-02 14:12           ` Vladimir Karpinsky
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Shigorin @ 2008-04-02 13:32 UTC (permalink / raw)
  To: hardware

On Wed, Apr 02, 2008 at 05:26:55PM +0400, Vladimir Karpinsky wrote:
> >>Один диск там явно плохой, а, вот, остальные менять пока не
> >>за что, им надо устроить серьёзные тесты прежде чем вводить в
> >>эксплуатацию.
> >bonnie++ по файловой.
> Я что-то не понял, как его запускать: он не хочет запускаться
> от root'а, требует -u user. А от пользователя ему не хватает
> прав на запись, например, в /.

Каталог сделать и пользователю вручить :)

-- 
 ---- WBR, Michael Shigorin <mike@altlinux.ru>
  ------ Linux.Kiev http://www.linux.kiev.ua/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Hardware] SMART errors
  2008-04-02 13:32         ` Michael Shigorin
@ 2008-04-02 14:12           ` Vladimir Karpinsky
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Karpinsky @ 2008-04-02 14:12 UTC (permalink / raw)
  To: hardware, shigorin

Michael Shigorin пишет:
> On Wed, Apr 02, 2008 at 05:26:55PM +0400, Vladimir Karpinsky wrote:
>>>> Один диск там явно плохой, а, вот, остальные менять пока не за что, им надо устроить серьёзные тесты прежде чем вводить в эксплуатацию.
>>> bonnie++ по файловой.
>> Я что-то не понял, как его запускать: он не хочет запускаться от root'а, требует -u user. А от пользователя ему не хватает прав на запись, 
>> например, в /.
> 
> Каталог сделать и пользователю вручить :)

А-а, т.е. он не по всему диску ползает, а только в специально отведённой резервации работает!
Как-то я это не уловил из мана. Спасибо!

Формат вывода у него не очень внятно описан, разберись тут:

Version  1.03       ------Sequential Output------ --Sequential Input---Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
newseismix       6G 47842  90 78127  19 36784  13 52300  92 137918  26 183.8   0
                     ------Sequential Create------ --------Random Create--------
                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
newseismix,6G,47842,90,78127,19,36784,13,52300,92,137918,26,183.8,0,16,+++++,+++,+++++,+++                      ,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Я не понимаю, что значат эти плюсики, почему иногда (при изменении ram size) вместо некоторых
из них появляются циферки (смысл циферок вроде понятен) и наоборот. Помогите, пожалуйста,
разобраться.

А есть какой-то эмпирический алгоритм использования bonnie++ именно для нагрузочного теста?
Или погонял, погонял а потом в SMART полез смотреть.

-- 
	С уважением,
		Владимир.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-02 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-23  9:11 [Hardware] SMART errors Vladimir Karpinsky
2008-03-23  9:19 ` Andrey Rahmatullin
2008-03-23  9:23   ` Vladimir Karpinsky
2008-03-23 14:52 ` Michael Shigorin
2008-03-23 17:37   ` Vladimir Karpinsky
2008-03-23 18:49     ` Michael Shigorin
2008-04-02 13:26       ` Vladimir Karpinsky
2008-04-02 13:32         ` Michael Shigorin
2008-04-02 14:12           ` Vladimir Karpinsky

ALT Linux hardware support

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://lore.altlinux.org/hardware/0 hardware/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 hardware hardware/ http://lore.altlinux.org/hardware \
		hardware@altlinux.ru hardware@lists.altlinux.org hardware@lists.altlinux.ru hardware@lists.altlinux.com hardware@altlinux.org
	public-inbox-index hardware

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://lore.altlinux.org/org.altlinux.lists.hardware


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git