I have a large machine( 3 x37G SCSSi, 2xPIII x1000MHz, 2GB RAM) machine with Suse 7.1 (that is 2.2.18 and 2.4.0) and an AHA29160 SCSI adapter which seems responsible for a lot aof oblems. Specifically when doing database inserts
after some time, the screen freezes and Alt+Ctrl+Delete does no t work and Alt+Ctrl++Backspace does not take me back to login.
When I look in /var/log/messages I see things like:
Here is the problem according to /var/log/messages:
ron.hourly) 3 01:00:00 quality4 /USR/SBIN/CRON[25181]: (root) CMD ( test -x /usr/lib/secchk/security-control.sh && /usr/lib/secchk/security-control.sh weekly &) 3 01:19:14 quality4 -- MARK -- 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.@ .................... 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b4 e5 00 00 20 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 35 00 00 30 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 8d 00 00 18 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 12 2b d5 00 00 08 00 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:43 quality4 kernel: (scsi0:0:2:0) Synchronous at 160.0 Mbyte/sec
offset 63. 3 01:23:56 quality4 kernel: (scsi0:0:0:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:24:27 quality4 kernel: (scsi0:0:1:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:39:38 quality4 -- MARK -- 3 01:59:00 quality4 /USR/SBIN/CRON[25626]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 02:19:38 quality4 -- MARK -- 3 02:39:38 quality4 -- MARK -- 3 02:59:00 quality4 /USR/SBIN/CRON[25747]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 03:19:38 quality4 -- MARK -- 3 03:39:38 quality4 -- MARK --
I also looked at /var/lib/mysql/hostname.err says:
23:17:33 mysqld ended23:19:17 mysqld started/usr/sbin/mysqld: ready for connections13:10:14 Aborted connection 4414725 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)16:57:45 Aborted connection 4414743 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)18:49:13 Aborted connection 4414753 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)18:55:35 Aborted connection 4414755 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:18:39 Aborted connection 4414757 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:20:07 Aborted connection 4414759 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)13:18:47 mysqld started/usr/sbin/mysqld: ready for connections15:26:20 Aborted connection 23 to db: 'CDR' user: 'svar' host: `localhost' (Goterror reading communication packets)
I looked at the Suse Faqs and found two things. First,
Adaptec 2940: Adaptec 2940 Timeouts Deutsch Cesky Top of Form 2 Email address: Bottom of Form 2 Support knowledgebase (cg_seltime) Applies to SuSE Linux: Versions since 6.4 Kernel: Versions since 2.2.14 Symptom: You are using an adaptec SCSI controller like models 2940U or 2940UW and have connected some devices to it. Unfortunately not all devices were detected during Linux-side initialization of the controller. The computer stops with an error message like this one: Freeing unused kernel memory: 64k freed SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 147978 SCSI 0: channel 0 target 1 lun 0 request sense failed, performing reset. SCSI bus is being reset for host 0 channel 0. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 81932 EXT2-fs error (device sd(8,1)): ext2_read_inode: unable to read inode block - inode=10201, block=40966 (scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8. (scsi0:0:1:0) Performing Domain validation (scsi0:0:1:0) Successfully completed Domain validation Kernel panic: No init found. Try passing init= option to kernel. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 152532 Cause: The selection timeout of the modul aic7xxx defaults to 64ms. Nevertheless, some devices, especially older ones, need more time, for instance the ANSI SCSI-1 standard conforming 256ms. Solution: Use the option seltime on loading of the module aic7xxx: aic7xxx=seltime:0 This options sets the controller to selction timeout of 256ms. In the README file for the aic7xxx module are the following parameters for seltime defined: 0 - 256ms 1 - 128ms 2 - 64ms 3 - 32ms See also: Adaptec
So I changed lilo.conf:
# LILO configuration file
# Start LILO global Section
# If you want to prevent console users to boot with init=/bin/bash,
# restrict usage of boot params by setting a passwd and using the option
# restricted.
#password=bootpwd#
restricted="noapic aic7xxx=seltime:0"=/dev/sda#compact # faster, but won't work on all systems.=normal=/boot/messagescheme=Wg:kw:Wg:Wgonly=80# End LILO global Section#= /boot/vmlinuz root = /dev/sdc6 label = linux initrd = /boot/initrd#= /boot/vmlinuz_24 root = /dev/sdc6 label = linux_2.4 initrd = /boot/initrd_24 optional#
Things are a little better maybe, but still hangups.
I also found in Suse Faqs:
With Adaptec 19160, 29160, 39160 SCSI controller loading of module is impossible
Deutsch
Top of Form 1
Email address:
Bottom of Form 1
Support knowledgebase (jsj_29160_interrupt)
Applies to
SuSE Linux: Versions since 6.4
Symptom:
You have a SCSI controller of brand Adaptec AHA-19160, AHA-29160 or AHA-39160 and cannot install your copy of SuSE Linux. Durning load of the corresponding module aic7xxx it seems the computer locks up.
On Terminal 4 messages like the following are continually displayed:
scsi 1 host
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
Cause:
These SCSI controllers require the use of an unique interrupt caused by the design of the driver. They cannot share interrupts with other PCI cards.
Solution:
The only possible solution is the reconfiguration of the assignment of interrupts in the BIOS of your computer. Please have a look in the manual of your mainbaord.
It may also be possible, that you have to change the PCI slot of your SCSI controller card.
Usually there is a table displayed during boot, where your can check the assignment of interrupts to PCI devices. Here you may have a look, if your configurations succeeded.
See also:
SCSI-checklist
well, maybe this is not really applicable, since I can boot and for the most part work fine, but?
Any ideas what to do??
I should say I first looked for a hardware error, so
I changed the first and third /dev/sda and /dev/sdc disks
still hangups...
Checked sda with format, no errors found.
where does one go from there?
thanks, svar
---
after some time, the screen freezes and Alt+Ctrl+Delete does no t work and Alt+Ctrl++Backspace does not take me back to login.
When I look in /var/log/messages I see things like:
Here is the problem according to /var/log/messages:
ron.hourly) 3 01:00:00 quality4 /USR/SBIN/CRON[25181]: (root) CMD ( test -x /usr/lib/secchk/security-control.sh && /usr/lib/secchk/security-control.sh weekly &) 3 01:19:14 quality4 -- MARK -- 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.@ .................... 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b4 e5 00 00 20 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 35 00 00 30 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 8d 00 00 18 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 12 2b d5 00 00 08 00 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:43 quality4 kernel: (scsi0:0:2:0) Synchronous at 160.0 Mbyte/sec
offset 63. 3 01:23:56 quality4 kernel: (scsi0:0:0:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:24:27 quality4 kernel: (scsi0:0:1:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:39:38 quality4 -- MARK -- 3 01:59:00 quality4 /USR/SBIN/CRON[25626]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 02:19:38 quality4 -- MARK -- 3 02:39:38 quality4 -- MARK -- 3 02:59:00 quality4 /USR/SBIN/CRON[25747]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 03:19:38 quality4 -- MARK -- 3 03:39:38 quality4 -- MARK --
I also looked at /var/lib/mysql/hostname.err says:
23:17:33 mysqld ended23:19:17 mysqld started/usr/sbin/mysqld: ready for connections13:10:14 Aborted connection 4414725 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)16:57:45 Aborted connection 4414743 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)18:49:13 Aborted connection 4414753 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)18:55:35 Aborted connection 4414755 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:18:39 Aborted connection 4414757 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:20:07 Aborted connection 4414759 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)13:18:47 mysqld started/usr/sbin/mysqld: ready for connections15:26:20 Aborted connection 23 to db: 'CDR' user: 'svar' host: `localhost' (Goterror reading communication packets)
I looked at the Suse Faqs and found two things. First,
Adaptec 2940: Adaptec 2940 Timeouts Deutsch Cesky Top of Form 2 Email address: Bottom of Form 2 Support knowledgebase (cg_seltime) Applies to SuSE Linux: Versions since 6.4 Kernel: Versions since 2.2.14 Symptom: You are using an adaptec SCSI controller like models 2940U or 2940UW and have connected some devices to it. Unfortunately not all devices were detected during Linux-side initialization of the controller. The computer stops with an error message like this one: Freeing unused kernel memory: 64k freed SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 147978 SCSI 0: channel 0 target 1 lun 0 request sense failed, performing reset. SCSI bus is being reset for host 0 channel 0. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 81932 EXT2-fs error (device sd(8,1)): ext2_read_inode: unable to read inode block - inode=10201, block=40966 (scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8. (scsi0:0:1:0) Performing Domain validation (scsi0:0:1:0) Successfully completed Domain validation Kernel panic: No init found. Try passing init= option to kernel. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 152532 Cause: The selection timeout of the modul aic7xxx defaults to 64ms. Nevertheless, some devices, especially older ones, need more time, for instance the ANSI SCSI-1 standard conforming 256ms. Solution: Use the option seltime on loading of the module aic7xxx: aic7xxx=seltime:0 This options sets the controller to selction timeout of 256ms. In the README file for the aic7xxx module are the following parameters for seltime defined: 0 - 256ms 1 - 128ms 2 - 64ms 3 - 32ms See also: Adaptec
So I changed lilo.conf:
# LILO configuration file
# Start LILO global Section
# If you want to prevent console users to boot with init=/bin/bash,
# restrict usage of boot params by setting a passwd and using the option
# restricted.
#password=bootpwd#
restricted="noapic aic7xxx=seltime:0"=/dev/sda#compact # faster, but won't work on all systems.=normal=/boot/messagescheme=Wg:kw:Wg:Wgonly=80# End LILO global Section#= /boot/vmlinuz root = /dev/sdc6 label = linux initrd = /boot/initrd#= /boot/vmlinuz_24 root = /dev/sdc6 label = linux_2.4 initrd = /boot/initrd_24 optional#
Things are a little better maybe, but still hangups.
I also found in Suse Faqs:
With Adaptec 19160, 29160, 39160 SCSI controller loading of module is impossible
Deutsch
Top of Form 1
Email address:
Bottom of Form 1
Support knowledgebase (jsj_29160_interrupt)
Applies to
SuSE Linux: Versions since 6.4
Symptom:
You have a SCSI controller of brand Adaptec AHA-19160, AHA-29160 or AHA-39160 and cannot install your copy of SuSE Linux. Durning load of the corresponding module aic7xxx it seems the computer locks up.
On Terminal 4 messages like the following are continually displayed:
scsi 1 host
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
Cause:
These SCSI controllers require the use of an unique interrupt caused by the design of the driver. They cannot share interrupts with other PCI cards.
Solution:
The only possible solution is the reconfiguration of the assignment of interrupts in the BIOS of your computer. Please have a look in the manual of your mainbaord.
It may also be possible, that you have to change the PCI slot of your SCSI controller card.
Usually there is a table displayed during boot, where your can check the assignment of interrupts to PCI devices. Here you may have a look, if your configurations succeeded.
See also:
SCSI-checklist
well, maybe this is not really applicable, since I can boot and for the most part work fine, but?
Any ideas what to do??
I should say I first looked for a hardware error, so
I changed the first and third /dev/sda and /dev/sdc disks
still hangups...
Checked sda with format, no errors found.
where does one go from there?
thanks, svar
---