Index of /anton/hdtest

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory - [   ] hdtest-1.0.tar.gz 24-Nov-1999 22:41 9.7K
hdtest/hdcheck

This pair of programs checks whether a hard disk performs its writes
in order; many file systems and many applications accessing raw disks
require in-order writes to guarantee data consistency.

You can get the current version from
http://www.complang.tuwien.ac.at/anton/hdtest.tar.gz.


HOW DOES IT WORK?

It writes the blocks in an order like this:

1000-0-1001-0-1002-0-...

This sequence seems to inspire a number of IDE disks to write
out-of-order (in the order 1000-1001-1002-...-0).  So you turn off the
power while running the program.  The written blocks contain certain
data that is checked after the system is up again.


INSTALLATION

Just type

make

There is no installation, just run the programs from the directories
where they were compiled.


USE

You need a partition on the disk you want to check. Note that the DATA
on this PARTITION WILL BE DESTROYED. If all else fails, you could
swapoff a swap partition and use that; don't forget to mkswap it
afterwards.

Mount as many local file systems read-only as possible, with

mount -o remount,ro <fs>

(I could do this to the root file system only in single-user mode).
There is a small risk of losing file systems that are mounted
read-write in this test, and in any case, you would have to wait for
the fsck.

Now run hdtest:

./hdtest /dev/<partition> <magic> 0

where <partition> is the partition you want to use for this test (note
that its CONTENTS WILL BE OVERWRITTEN!).  <magic> is a number used for
checking which blocks were written; you should use a different <magic>
value for each run.

While hdtest is still running (no hurry; it typically takes about one
hour on a 600MB partition), cut the power. If you have an ATX power
supply, do this by pulling the plug; the regular "power switch" is not
the simulation of a power outage that we are interested in.

Wait a few seconds, then turn the machine back on and boot it. When it
is up again, run

./hdcheck /dev/<partition> 0

If everything is allright, the output looks similar to this:

last committed: 771266; magic: 5
blockid: 771250; magic: 5
blockid: 771251; magic: 5
blockid: 771252; magic: 5
blockid: 771253; magic: 5
blockid: 771254; magic: 5
blockid: 771255; magic: 5
blockid: 771256; magic: 5
blockid: 771257; magic: 5
blockid: 771258; magic: 5
blockid: 771259; magic: 5
blockid: 771260; magic: 5
blockid: 771261; magic: 5
blockid: 771262; magic: 5
blockid: 771263; magic: 5
blockid: 771264; magic: 5
blockid: 771265; magic: 5
blockid: 771266; magic: 5
**** everything before should have correct magic ******
blockid: 771267; magic: 5
**** nothing below should have correct magic (except with luck)******
blockid: 9928529; magic: -1804175265
blockid: 9928531; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928535; magic: -1804175265
blockid: 9928537; magic: -1804175265
blockid: 0; magic: 0
blockid: 0; magic: 0
blockid: 9928543; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928547; magic: -1804175265
blockid: 9928549; magic: -1804175265
blockid: 9928551; magic: -1804175265
blockid: 0; magic: 0
blockid: 9928555; magic: -1804175265

where the number behind "magic:" in the lines up to "everything before
should have correct magic" should be the <magic> parameter you gave to
hdtest (in this case 5).

If the magic numbers are incorrect, the disk was written out-of-order.
If the magic in the first line ("last committed...") is wrong, then
block 0 has never be written, and the rest of the output is
meaningless (because then hdcheck does not even display the right area
of the disk).


RESULTS

I have tested a Quantum Fireball CR8.4A disk in the default
configuration under Linux 2.2.1 on a Red Hat 5.1 distribution.  The
result was that block 0 was not even written once to the disk (i.e.,
the "last commited" line had the wrong magic number) before I turned
off the power, while hundreds of other blocks were written.

I got the same result on a IBM-DHEA-36480.

I then turned off write caching on the Quantum with

hdparm -W 0 /dev/<disk>

and repeated the test.  In contrast to earlier hdtest was now quite
noisy and produced the correct result.  This implies that the hard
disk originally (in the default setting) used write caching and
somehow "optimized" the writes to block 0 away.

My conclusion is that applications and file systems requiring in-order
writes should turn off write caching for the disk they use.


BUG REPORTS AND COMMENTS

Report bugs and send comments to anton@mips.complang.tuwien.ac.at.


LICENSE

GPL. See COPYING