Click to See Complete Forum and Search --> : Hardware causing Linux to freeze?


linuxpyro
07-17-2004, 05:16 PM
Hello, I have a dual AMD MP 2800 box, with 512 MB RAM. I used Fedora Core 1, but am now trying out SuSE 9.1 pro (FTP install, on a different hard drive so I can go back to Fedora if necessary). Under Fedora my box would freeze up from time to time, usually when trying to do something like play music and browse the Web at the same time. The mouse doesn't work, and I can't even use Ctrl+Alt+F2 to shut down the X server.

I switched to SuSE thinking it would be more stable, but it has frozen a few times now too, under the same circumstances. I am thinking that this is a hardware issue. Anyone ever had any similar problems? What works to resolve a situation like this? This box is the first instance I've ever really had Linux freeze...

kevinalm
07-17-2004, 08:43 PM
From experience I can tell you that a buggy driver (in my case it was a prerelease soundcard driver for an "orphan" soundcard) is one thing that can cause a hard lockup in linux. Hardware and/or hardware modules are a definate suspect.

Regards.

cybertron
07-17-2004, 08:47 PM
I had trouble with the local apic on my computer causing lockups, so I had to boot with nolapic (NOLAPIC lowercase) appended to the kernel options. I also tried acpi=off and noapic (without the L), but nolapic seemed to be the one that did the trick.

linuxpyro
07-17-2004, 09:40 PM
It seem like it might have something to do with ALSA, as I have noticed it while messing with audio. But I'll try the NOLAPIC kernel option. Thanks a lot! :)

gehidore
07-17-2004, 11:57 PM
Originally posted by linuxpyro
Hello, I have a dual AMD MP 2800 box, with 512 MB RAM. I used Fedora Core 1, but am now trying out SuSE 9.1 pro (FTP install, on a different hard drive so I can go back to Fedora if necessary). Under Fedora my box would freeze up from time to time, usually when trying to do something like play music and browse the Web at the same time. The mouse doesn't work, and I can't even use Ctrl+Alt+F2 to shut down the X server.

I switched to SuSE thinking it would be more stable, but it has frozen a few times now too, under the same circumstances. I am thinking that this is a hardware issue. Anyone ever had any similar problems? What works to resolve a situation like this? This box is the first instance I've ever really had Linux freeze...

also to kill the X server try

ctrl alt bkspc

linuxpyro
07-19-2004, 11:03 PM
I've tried that... It won't respond to anything. Also, booting with the noapic kernel option doesn't make any difference. :(

gehidore
07-20-2004, 12:36 AM
Originally posted by linuxpyro
dual AMD MP 2800 box, with 512 MB RAM.

please tell me you used thermal compound.

you did set up your kernel for smp right?

linuxpyro
07-21-2004, 11:28 PM
I didn't apply any; the instructions that came with the procs said that there was a bit of thermal compound on the processor itself, and even recommended against using the thermal compound. At any rate, I checked the temperatures of the CPUs, and they both seem normal, at about 28 degrees C each.

I am using the SMP kernel that came with SuSE, as well as my Fedora installation on my other hard drive. Just out of curiosity, what happens if I boot with a non-SMP kernel on my dually? Does only one CPU get used?

I don't know if this makes any difference, but after a recent crash I checked the X server log and found this entry:
(WW) NV(0): Bad V_BIOS checksum
I don't know if it means anything, but it seemed a bit suspicious. I also tried running Memtest86, which I left going for about 17 hours, without it turning up any errors.

I'm thinking I might try to compile my own kernel, and see if things improve with that, but I'm not 100% certain that will solve the problem.

linuxpyro
07-21-2004, 11:33 PM
One more thing... I went through the log in /var/log/warn, and I found this entry, just before the system crashed last time:

Jul 21 16:08:37 linux kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
Jul 21 16:08:37 linux kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
Jul 21 16:08:37 linux kernel: hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
Jul 21 16:08:37 linux kernel: hda: task_no_data_intr: error=0x04 { DriveStatusError }
Jul 21 16:08:40 linux kernel: end_request: I/O error, dev fd0, sector 0
Jul 21 16:08:40 linux kernel: end_request: I/O error, dev fd0, sector 0
Jul 21 16:08:40 linux kernel: hdc: ATAPI 52X CD-ROM CD-R/RW CD-MRW drive, 2048kB Cache
Jul 21 16:08:53 linux kernel: mtrr: 0xf0000000,0x4000000 overlaps existing 0xf0000000,0x1000000

gehidore
07-21-2004, 11:33 PM
Originally posted by linuxpyro

(WW) NV(0): Bad V_BIOS checksum


HA!

i just had the same error about 8 days ago.

my ati rage card died :( , died for linux at any rate.


the bios went out on the card i know that much.

vbios=video bios (/me thinks)


i'd guess your fine with your proc's and all but i guess its the video card, try another if you have one. my bet is the freezing goes away.

kevinalm
07-22-2004, 01:04 AM
The mtrr error makes the vid card suspect. I'm no expert but I know they have to do with video card memory io. Maybe you have a resource conflict? Install any new hardware lately?

linuxpyro
07-22-2004, 09:25 AM
I installed a new sound card a couple months ago, but I was having this problem before that. I'll see if I can borrow another vid card and try it out...

XiaoKJ
07-22-2004, 10:29 AM
Mine own stumble was setting all sound card settings to high, overloading it.

If alsa was making your system funny, try lowering some settings.

BTW, seems like its really the video card

Dark Ninja
07-22-2004, 10:40 AM
Jul 21 16:08:37 linux kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
In case you were worried about this error, it *most likely* is not a problem. In fact, I know there's a kernel copmilation option (at least in 2.6.*) that fixes this error.

I know it doesn't have to do directly with your problem, but I just wanted to share the knowledge.

Loki3
07-22-2004, 01:24 PM
Originally posted by Dark Ninja
In case you were worried about this error, it *most likely* is not a problem. In fact, I know there's a kernel copmilation option (at least in 2.6.*) that fixes this error.

I know it doesn't have to do directly with your problem, but I just wanted to share the knowledge.

Generally that error is DMA pushing your hard drive to much. You can turn DMA off by default in the block devices configuration section... I think. There's so many options in there.

linuxpyro
07-27-2004, 03:44 PM
Everyone, thanks for the input. I was messing with my box the other day, and decided I was going to swap the vid card with one that I had borrowed. Upon opening the case, I realized that the card I had borrowed did not fit any of the slots that I had open, so I put my original NVidia back in. Well, oddly enough, things seem somewhat stable. I'm in the process of trying to make it crash again, and when it does I will swap in an appropriate video card at that point. I'll keep you guys informed.

Once again, thanks for the help!

kevinalm
07-27-2004, 05:05 PM
Sometimes just the act of pulling and reinserting the card will correct the problem. Dirty contacts. Hope you got lucky. ;)