Index of /anton/hdtest

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[DIR]Old/2009-04-07 21:26 -  
[   ]hdtest-1.1.tar.gz2009-04-14 18:11 12K 

hdtest/hdcheck

This pair of programs checks whether a hard disk performs its writes
in order; many file systems and many applications accessing raw disks
require in-order writes to guarantee data consistency.

You can get the current version from
http://www.complang.tuwien.ac.at/anton/hdtest/


HOW DOES IT WORK?

It writes the blocks in an order like this:

1000-0-1001-0-1002-0-...

This sequence seems to inspire PATA and SATA disks to write
out-of-order (in the order 1000-1001-1002-...-0).  So you turn off the
drive's power while running the program.  The written blocks contain
certain data that another program from the suite can check after you
power the drive up again.


CAVEATS

You can use a partition or a whole drive for checking.  Note that the
DATA on this PARTITION OR DRIVE WILL BE DESTROYED.  Also, some drives
are sensitive to power variations during accesses, and can destroy the
low-level formatting of one or more sectors; you can often fix this by
low-level formatting the hard drive, but it's hard to find the proper
tools for that; moreover, I would not want to use such a drive for
data anyway.  The ideal way to use this test is with a new or fully
backed-up drive, and afterwards use "smartctl -t long" to check for
low-level formatting errors.  Also, make absolutely sure you are using
the right device name!

If you have used a whole drive, the test destroys the partition table
and you probably want to repartition the drive afterwards.  If you
have used a partition, the test has destroyed the file system, if any,
and you have to high-level format the drive afterwards.


INSTALLATION

Just type

make

There is no installation, just run the programs from the directories
where they were compiled.


USE

You can check a partition or a whole drive. If all else fails, you
could swapoff a swap partition and use that (but read the caveats
above first); don't forget to mkswap it afterwards.

A convenient way to test a drive is to put it in a USB enclosure;
there you can power it off without powering off the whole system.  I
have not tested eSATA enclosures yet, there it may depend on the OS,
chipset and driver how well the OS survives powering off the hard
disk.

Alternatively, you can put the drive on the same power supply as the
rest of the system, and power it off by powering off the system.  I
that case, you want as few local file systems mounted read-write as
possible; so umount the local file systems or mount them read-only,
with:

mount -o remount,ro <fs>

(I could do this to the root file system only in single-user mode).
There is a small risk of losing file systems that are mounted
read-write in this test, and on some file systems, you would have to
wait for the fsck.

Now run hdtest:

./hdtest /dev/<partition> <magic> 0

where <partition> is the partition (or drive) you want to use for this
test (note that its CONTENTS WILL BE OVERWRITTEN!).  <magic> is a
number used for checking which blocks were written; you should use a
different <magic> value for each run.

While hdtest is still running (no hurry; it typically takes about one
minute on a 400MB partition), cut the power. If the drive is on an ATX
power supply, don't use the soft-power button (on the front of the
box), but either a hard power switch on the power supply (if it
exists), or pull the plug; the regular soft "power switch" is not the
simulation of a power outage that we are interested in.

If you are not powering off the whole system, you will see output like:

writing 195363030 done, writing 0 write(0): Input/output error

which means that the test wrote block 195363030, then started writing
block 0, but the drive was turned off before that write completed.

Wait a few seconds, then turn the machine or enclosure back on. When
it is up again, run

./hdcheck /dev/<partition> 0

If the state of the hard disk corresponds to a logical state reached
at some point in time up to the power off, the output looks similar to
this:

last committed: 771266; magic: 5
blockid: 771250; magic: 5
blockid: 771251; magic: 5
[...]
blockid: 771265; magic: 5
blockid: 771266; magic: 5
**** everything before should have correct magic ******
blockid: 771267; magic: 5
**** nothing below should have correct magic (except with luck)******
blockid: 9928529; magic: -1804175265
blockid: 9928531; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928535; magic: -1804175265
blockid: 9928537; magic: -1804175265
blockid: 0; magic: 0
blockid: 0; magic: 0
blockid: 9928543; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928547; magic: -1804175265
blockid: 9928549; magic: -1804175265
blockid: 9928551; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928555; magic: -1804175265

where the number behind "magic:" in the lines up to "everything before
should have correct magic" should be the <magic> parameter you gave to
hdtest (in this case 5).

However, in many cases the output looks as follows:

last committed: 195359787; magic: 99999
blockid: 195359771; magic: 99999
[...]
blockid: 195359787; magic: 99999
**** everything before should have correct magic ******
blockid: 195359788; magic: 99999
**** nothing below should have correct magic ******
blockid: 195359789; magic: 99999
[...]
blockid: 195359802; magic: 99999
... blockid: 195362852
wrote 3064 blocks out-of-order

This means that the last version of block 0 that found its way to the
platter was the version right after writing block 195359787; but the
drive wrote at least blocks 195359789...195362852 out-of-order, i.e.,
3064 blocks (this was found out by checking the magic number).
Moreover, since this is the check output corresponding to the test
output shown above, we see that 195363030-195362852=178 of the
consecutively written blocks were accepted by the disk drive, but did
not hit the platters.

Another thing that sometimes happens is that the first line of the
output shows the wrong magic number (and probably also an unplausible
"last committed" block number, so the rest of the output is
meaningless, because it refers to the wrong area of the disk drive).
Then block 0 did not find its way to the disk platters at all during
the test.


RESULTS

I performed two sets of tests, one in November 1999, and one in April
2009.  The results have not changed much.  In both tests disks write
data seriously out-of-order in their default configuration; they can
delay the writing of block 0 in this test for quite a long time.

In more detail:

In 2009 I tested three drives (and accessed the whole drive) under
Linux 2.6.18 on Debian Etch; the USB enclosure used was a Tsunami
Elegant 3.5" Enclosure that has PATA and SATA disk drive interfaces.

* Maxtor L300R0 PATA (300GB) connected through an USB enclosure: In
  two tests it wrote the consecutive blocks 47 and 34 blocks after the
  last written block 0.

* Seagate ST340062 Model 0A PATA (7200.10, 400GB):
   connected through a USB enclosure:
    3 times the result was as if it had written the blocks in-order
    1 time  it wrote  3064 blocks out-of-order
    2 times it wrote 18384 blocks out-of-order
   connected directly via PATA cable:
    1 time it wrote 1972 blocks out-of-order

* Seagate ST340062 Model 0AS SATA (7200.10, 400GB) connected through a
  USB enclosure:
    1 time the result was as if it had written the blocks in-order
    2 times it wrote  3064 blocks out-of-order
    1 time  it wrote  6128 blocks out-of-order
    1 time  it wrote 12256 blocks out-of-order
    1 time it did not write block 0 at all

It is interesting that the number of blocks that is found to be
out-of-order is often a multiple of 3064.  Maybe this is a multiple of
a track size; no other explanations come to mind.

In 1999 I tested two drives (and accessed one partition) under
Linux-2.2.1 on RedHat 5.1.  The two drives were a Quantum Fireball
CR8.4A (8GB) and an IBM-DHEA-36480 (6GB), both connected directly via
PATA.  I did one test with each of the disks, and they did not even
write block 0 once on the platters before I turned off the power.

I also tested the Quantum with write caching disabled (hdparm -W 0).
Hdtest was now quite noisy and produced the in-order result.


CONCLUSION

Applications and file systems requiring in-order writes (i.e.,
basically all of them) should use barriers or turn off write caching
for the disk drive(s) they use.  Unfortunately, the Linux ext3 file
system does not use barriers by default; use the mount option
barrier=1 to enable them, e.g. by putting a line like this in
/etc/fstab:

/dev/md2        /home           ext3    defaults,barrier=1      1	2


BUG REPORTS AND COMMENTS

Report bugs and send comments to anton@mips.complang.tuwien.ac.at.


LICENSE

GPL. See COPYING