Click to See Complete Forum and Search --> : Deonet Raid


r.dirksen
03-22-2004, 09:50 AM
Hi,

I have a problem with a Deonet raid system, aka Soho raid. This is a box, containing two brand net Maxtor 40GBharddisks and a raid controller. The box behaves as one normal IDE harddisk. Raid is set to mirroring.

The raidbox is mounted as master on the second controller of a 166 MHZ Pentium with 64 MB memory. The primairy master is containing RedHat 8.0.

Filesystem is set to ext3 (all disks). The PC is acting as a fileserver running the latest Samba version in a windows 2K client environment.

The problem is that on unpredictable moments one disk (mostly the primary disk) of the Raid set is failing and a few seconds later the second disk will fail too.
Linux is still running. When I do a df, I can see the deonet box, only the size = 0. Trying to unmount the raidbox fails with the message Volume Busy.
Rebooting the system will solve the problem for a few days to a week, and then it happens again.

I already replaced the Deonet box, with no success.
I checked the Harddisks with badblocks, no indications of a problem.
I connected the box to another powerline, no luck
I tried to hook the Deonet box as slave on the first controller, but the cable is too short :-(
It is not likely to be a thermal problem because a reboot directly after a fail, will solve the problem.

What are my options:
1) use a different PC, but configuring a new PC incl. Samba is a lot of work.
2) Switch back to Windows 2000 to see if Linux has something to do with it.
3) I have two exactly the sames PC's, I could swap all disks form one PC to the other to rule out the possibility that the problem is PC related.

Any help (also with analyzing the problem) will be much appreciated.

Rob.

mdwatts
03-22-2004, 11:33 AM
Originally posted by r.dirksen

1) I have two exactly the sames PC's, I could swap all disks form one PC to the other to rule out the possibility that the problem is PC related.

2) Switch back to Windows 2000 to see if Linux has something to do with it.


First have a look through all your system logfiles in /var/log to see if you can possibly spot the problem. If you cannot find anything, I would suggest you troubleshoot in the order above.

r.dirksen
03-24-2004, 05:17 AM
Thanks for hinting me to the /var/log.

Looked into the file messages and I snipped a few lines from it (see below)
Although I am not yet familiar with reading these log I can see that there are problems with
- the DMA,
- the IDE1 (i presume that this is the second controller)
- sector errors on the hdc

but what really bothers me is the message about the EXT-3 file system with a hole in sector 0


can anybody help me with the interpretation of these messages?

thanx

Rob.

Mar 15 11:59:08 TotPR-Server kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Mar 15 11:59:08 TotPR-Server kernel: hdc: dma_intr: error=0x04 { DriveStatusError }
Mar 15 12:50:01 TotPR-Server kernel: hdc: dma_timer_expiry: dma status == 0x21
Mar 15 12:50:10 TotPR-Server kernel: hdc: error waiting for DMA
Mar 15 12:50:10 TotPR-Server kernel: hdc: dma timeout retry: status=0x00 { }
Mar 15 12:50:10 TotPR-Server kernel:
Mar 15 12:50:10 TotPR-Server kernel: hdc: status error: status=0x00 { }
Mar 15 12:50:10 TotPR-Server kernel:
Mar 15 12:50:10 TotPR-Server kernel: hdc: drive not ready for command
Mar 15 12:50:10 TotPR-Server kernel: ide1: reset: master: error (0x00?)
Mar 15 12:50:11 TotPR-Server kernel: /O error, dev 16:00 (hdc), sector 33560968
Mar 15 12:50:11 TotPR-Server kernel: end_request: I/O error, dev 16:00 (hdc), sector 33560976
Mar 15 12:50:11 TotPR-Server kernel: end_request: I/O error, dev 16:00 (hdc), sector 33560984
Mar 15 12:50:18 TotPR-Server kernel: EXT3-fs error (device ide1(22,0)): ext3_readdir: directory #2 contains a hole at offset 0
Mar 15 12:50:18 TotPR-Server kernel: end_request: I/O error, dev 16:00 (hdc), sector 0
Mar 15 12:50:18 TotPR-Server kernel: end_request: I/O error, dev 16:00 (hdc), sector 48
Mar 15 12:50:18 TotPR-Server kernel: EXT3-fs error (device ide1(22,0)): ext3_get_inode_loc: unable to read inode block - inode=2, block=6
Mar 15 12:50:18 TotPR-Server kernel: end_request: I/O error, dev 16:00 (hdc), sector 0
Mar 15 12:50:18 TotPR-Server kernel: EXT3-fs error (device ide1(22,0)) in ext3_reserve_inode_write: IO failure

mdwatts
03-24-2004, 11:30 AM
The following is from Configure.help in the kernel source.


hda: set_multmode: status=0x51 { DriveReady SeekComplete Error }
hda: set_multmode: error=0x04 { DriveStatusError }

CONFIG_IDEDISK_MULTI_MODE:=y
If you get this error, try to say Y here:


You should be able to enable multimode with the hdparm -m option. I use

-X69 -d1 -u1 -m16 -c3 -A1 -a24 -k1

for hda (ata100).

r.dirksen
03-26-2004, 07:47 AM
Hi,

The current settings for hdc are:
X 0
d 1
u 0
m 16 so multimode is already on
c 0 (I don't know if my 166 MHz Pentium 1 is supporting 32 bit IO)
A is not set
a 8
k 0

If I read the hdparm man page I'm really not sure if I should change X (ide transfer mode); the man page warns for data corruption.

what is the risk of setting c 3?

I would change the following params:
u 1
A 1
k 1


and see what effect this will have. Below a print of pdparm -I

It looks like udma2 is selected (asterix in front) but I don't know if this old PC supports Ultra DMA

ATA device, with non-removable media
Model Number: DeoNet RAID
Serial Number:
Firmware Revision: Rev2.1.3
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 0
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 80293248
device size with M = 1024*1024: 39205 MBytes
device size with M = 1000*1000: 41110 MBytes (41 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 1
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* Power Management feature set
SMART feature set
* FLUSH CACHE EXT command
* Mandatory FLUSH CACHE command
* SMART self-test
HW reset results:
CBLID- above Vih
Device num = 1

mdwatts
03-26-2004, 11:46 AM
A couple of boot options I use in my Grub config may be worth looking into to see if it will help fix your problem.

idebus=## (I use 66)
ide0=autotune

Search www.google.com/linux for those to find explainations on what exactly they are.

Have you checked to see if CONFIG_IDEDISK_MULTI_MODE is enabled in your kernel? Have a look in .config.