From Lyceum
Jump to: navigation, search

Introduction

Want to know how to find what file is associated with a sector and what inode it has?

How do you get rid of those "pending sector" errors that smartd reports when they never seem to go away?

What the hell is a pending sector anyway?

There are two primary means of discovering the specific bad sector(s) on a drive using SMART. You may want to attempt to write to that specific sector on a drive, or determine what file on that sector is possibly corrupt, etc. If you determine the file you can attempt to move it and nuke the sector from orbit.

This article was written as a summation of how to do this. It is based heavily on these two articles, but is a condensation of that information.

http://smartmontools.sourceforge.net/badblockhowto.html

http://www2.uic.edu/~aciani1/sector_blues.html

For more information on SMART in general, see the article SMART Drive Monitoring

SMART Attributes: Pending Sectors

If a read error occurs on a sector the error is recorded in the smart log and the sector is marked pending in the attribute list. It will remain pending until a failed write operation, at witch time it will be reallocated.

The pending sectors can generate error messages, etc. You can force write to them and reallocate them, however if you do it will nuke the file in that sector. Hummm . . . a conundrum.

The sector number of the last read errors will be given in the error section of the smart logs, for example:

Error 3 occurred at disk power-on lifetime: 14511 hours (604 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 48 b8 e2 e0  Error: UNC 8 sectors at LBA = 0x00e2b848 = 14858312

(Some drives will only give the value in hex, you would then have to convert it to decimal.)

The error log only shows the last five errors. It may contain the same or different sector numbers. If it only contains one sector, yet the attribute list shows more than one you will most likely have to repair the first one, then run a smart read test, rinse and repeat.

Other errors beside read error get recorded too, so if none is given here, examine the smart read test results.

SMART Read Test Failure

Another method of locating the bad sector number is to run a smart read test, short or extended, with smartctl. If it fails it will report the sector number. You can run either of these tests while the server is up. Unless the server has a very high I/O it will not affect performance. The command to run the test is:

#smartctl -t short /dev/hda

Upon completion check the smart log section on test results and you will see entries giving the bad sectorsuch as:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                    Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     14577         232962120
# 2  Short offline       Completed: read failure       10%     14504         232962120
# 3  Conveyance offline  Completed without error       00%      1501         -

Calculating the File System Sector

Because sectors are logical on the drive (Logical Block Addressing = LBA) you need to convert between LBA and physical (file system) sectors. This is pretty easy to do:

First - get a table of the start and end sectors of the partition table:

[[email protected] ~]# fdisk -lu /dev/hda

Disk /dev/hda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *          63      208844      104391   83  Linux
/dev/hda2          208845     4401809     2096482+  83  Linux
/dev/hda3         4401810     8482319     2040255   82  Linux swap
/dev/hda4         8482320   234436544   112977112+   5  Extended
/dev/hda5         8482383    29447144    10482381   83  Linux
/dev/hda6        29447208    50411969    10482381   83  Linux
/dev/hda7        50412033    52516484     1052226   83  Linux
/dev/hda8        52516548   234436544    90959998+  83  Linux

Use this to determine what partition the bad sector is in. In this case 232962120 is inside the start and end values for /dev/hda5

NOTE: This is in partition 5 - ignore partition 4 as it is the extended partition. Any block from partitions 5 through 8 will also be in partition 4, but you want the real partition, not the extended partition.

Next, calculate the file system block using the formula:

b = (int)((L-S)*512/B)

where:

b = File System block number B = File system block size in bytes (almost always is 4096) L = LBA of bad sector S = Starting sector of partition as shown by fdisk -lu and (int) denotes the integer part.

For example:

The reported sector from the smart log above is 232962120, thus:

((14858312 - 8482383) * 512) / 4096 = 796991.125
  ^Bad Sec.  ^Start Sec.              ^Cha Ching! This is the sector!

(Use the block number from the smart test section, not from the smart error log section. They are using different methods of reporting file system vs. physical blocks.)

((BadBLock - StartPartition) * 512) / 4096 
 You can just paste this into Google as a template

Any fraction left indicates the problem sector is in the mid or latter part of the block (which contains a number of sectors). Ignore the fraction and just use the integer.

Next, use debugfs to locate the inode and then file associated with that sector:

[[email protected]]# debugfs
debugfs 1.35 (28-Feb-2004)
debugfs:  open /dev/hda5
debugfs:  icheck 796991
Block   Inode number
796991  <block not found>
debugfs:  quit

Ah! It didn't give the inode! It if did, you could have found the file with:

[[email protected]]# debugfs
debugfs 1.35 (28-Feb-2004)
debugfs:  open /dev/hda5
debugfs:  icheck 796991
Block   Inode number
796991 41032
debugfs:  ncheck 41032
Inode   Pathname
41032   /S1/R/H/714197568-714203359/H-R-714202192-16.gwf

So what the heck? Why no inode? Well, remember how it said the sector might be bad?

Banishing Evil Sectors

If you want to check it with a read test try:

dd if=/dev/hda5 of=my.block skip=796991 bs=4096 count=1

If it gives an error, then you know for sure it is bad.

As far as I know, that sector is toast at this point.

If you want to force a write to it, destroying the data in it, use:

dd if=/dev/zero of=/dev/hda5 bs=4096 count=1 seek=796991
sync

That sector should no longer be listed as pending in the smart attributes, it should now show as reallocated.