Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

aic7xxx Problems...

Status
Not open for further replies.

svar

Programmer
Aug 12, 2001
349
GR
I have a large machine( 3 x37G SCSSi, 2xPIII x1000MHz, 2GB RAM) machine with Suse 7.1 (that is 2.2.18 and 2.4.0) and an AHA29160 SCSI adapter which seems responsible for a lot aof oblems. Specifically when doing database inserts
after some time, the screen freezes and Alt+Ctrl+Delete does no t work and Alt+Ctrl++Backspace does not take me back to login.
When I look in /var/log/messages I see things like:
Here is the problem according to /var/log/messages:

ron.hourly) 3 01:00:00 quality4 /USR/SBIN/CRON[25181]: (root) CMD ( test -x /usr/lib/secchk/security-control.sh && /usr/lib/secchk/security-control.sh weekly &) 3 01:19:14 quality4 -- MARK -- 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:08 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:08 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:09 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:09 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.@ .................... 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x9 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: (scsi0:-1:-1:-1) Referenced SCB 0 not valid during SELTO.Sep 3 01:20:39 quality4 kernel: SCSISEQ = 0x5a SEQADDR = 0x8 SSTAT0 = 0x10 SSTAT1 = 0x8a 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b4 e5 00 00 20 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 35 00 00 30 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 ab b5 8d 00 00 18 00 3 01:20:39 quality4 kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 0 Read (10) 00 00 12 2b d5 00 00 08 00 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:40 quality4 kernel: SCSI host 0 abort (pid 0) timed out - resetting 3 01:20:40 quality4 kernel: SCSI bus is being reset for host 0 channel 0. 3 01:20:43 quality4 kernel: (scsi0:0:2:0) Synchronous at 160.0 Mbyte/sec

offset 63. 3 01:23:56 quality4 kernel: (scsi0:0:0:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:24:27 quality4 kernel: (scsi0:0:1:0) Synchronous at 160.0 Mbyte/sec, offset 63. 3 01:39:38 quality4 -- MARK -- 3 01:59:00 quality4 /USR/SBIN/CRON[25626]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 02:19:38 quality4 -- MARK -- 3 02:39:38 quality4 -- MARK -- 3 02:59:00 quality4 /USR/SBIN/CRON[25747]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 3 03:19:38 quality4 -- MARK -- 3 03:39:38 quality4 -- MARK --



I also looked at /var/lib/mysql/hostname.err says:

23:17:33 mysqld ended23:19:17 mysqld started/usr/sbin/mysqld: ready for connections13:10:14 Aborted connection 4414725 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)16:57:45 Aborted connection 4414743 to db: 'CDR' user: 'svar' host: `localhost'(Got an error reading communication packets)18:49:13 Aborted connection 4414753 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)18:55:35 Aborted connection 4414755 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:18:39 Aborted connection 4414757 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)19:20:07 Aborted connection 4414759 to db: 'CDR' user: 'svar' host: `localhost'(Got an error writing communication packets)13:18:47 mysqld started/usr/sbin/mysqld: ready for connections15:26:20 Aborted connection 23 to db: 'CDR' user: 'svar' host: `localhost' (Goterror reading communication packets)

I looked at the Suse Faqs and found two things. First,


Adaptec 2940: Adaptec 2940 Timeouts Deutsch Cesky Top of Form 2 Email address: Bottom of Form 2 Support knowledgebase (cg_seltime) Applies to SuSE Linux: Versions since 6.4 Kernel: Versions since 2.2.14 Symptom: You are using an adaptec SCSI controller like models 2940U or 2940UW and have connected some devices to it. Unfortunately not all devices were detected during Linux-side initialization of the controller. The computer stops with an error message like this one: Freeing unused kernel memory: 64k freed SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 147978 SCSI 0: channel 0 target 1 lun 0 request sense failed, performing reset. SCSI bus is being reset for host 0 channel 0. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 81932 EXT2-fs error (device sd(8,1)): ext2_read_inode: unable to read inode block - inode=10201, block=40966 (scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8. (scsi0:0:1:0) Performing Domain validation (scsi0:0:1:0) Successfully completed Domain validation Kernel panic: No init found. Try passing init= option to kernel. SCSI disk error: host 0 channel 0 id 1 lun 0 return code=1 scsidisk I/O error: dev 08:01, sector 152532 Cause: The selection timeout of the modul aic7xxx defaults to 64ms. Nevertheless, some devices, especially older ones, need more time, for instance the ANSI SCSI-1 standard conforming 256ms. Solution: Use the option seltime on loading of the module aic7xxx: aic7xxx=seltime:0 This options sets the controller to selction timeout of 256ms. In the README file for the aic7xxx module are the following parameters for seltime defined: 0 - 256ms 1 - 128ms 2 - 64ms 3 - 32ms See also: Adaptec


So I changed lilo.conf:


# LILO configuration file
# Start LILO global Section
# If you want to prevent console users to boot with init=/bin/bash,
# restrict usage of boot params by setting a passwd and using the option
# restricted.
#password=bootpwd#
restricted="noapic aic7xxx=seltime:0"=/dev/sda#compact # faster, but won't work on all systems.=normal=/boot/messagescheme=Wg:kw:Wg:Wgonly=80# End LILO global Section#= /boot/vmlinuz root = /dev/sdc6 label = linux initrd = /boot/initrd#= /boot/vmlinuz_24 root = /dev/sdc6 label = linux_2.4 initrd = /boot/initrd_24 optional#

Things are a little better maybe, but still hangups.

I also found in Suse Faqs:

With Adaptec 19160, 29160, 39160 SCSI controller loading of module is impossible
Deutsch
Top of Form 1
Email address:
Bottom of Form 1
Support knowledgebase (jsj_29160_interrupt)
Applies to
SuSE Linux: Versions since 6.4
Symptom:
You have a SCSI controller of brand Adaptec AHA-19160, AHA-29160 or AHA-39160 and cannot install your copy of SuSE Linux. Durning load of the corresponding module aic7xxx it seems the computer locks up.
On Terminal 4 messages like the following are continually displayed:
scsi 1 host
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
scsi : aborting command due to timeout: pid 0, scsi0, channel 0, id 0, lun 0 Test unig re00 00 00 00 00
Cause:
These SCSI controllers require the use of an unique interrupt caused by the design of the driver. They cannot share interrupts with other PCI cards.
Solution:
The only possible solution is the reconfiguration of the assignment of interrupts in the BIOS of your computer. Please have a look in the manual of your mainbaord.
It may also be possible, that you have to change the PCI slot of your SCSI controller card.

Usually there is a table displayed during boot, where your can check the assignment of interrupts to PCI devices. Here you may have a look, if your configurations succeeded.

See also:
SCSI-checklist


well, maybe this is not really applicable, since I can boot and for the most part work fine, but?


Any ideas what to do??
I should say I first looked for a hardware error, so
I changed the first and third /dev/sda and /dev/sdc disks
still hangups...
Checked sda with format, no errors found.

where does one go from there?


thanks, svar




---
 
i've not read all of what you've put so ignore me if this is not too helpful :)

it's difficult to read logs when there's weird word wrap goin on ;)

first i guess your using the 2.4.0 kernel? if this is the case then i recommend you update. the 2.4.0 has had some bugs with hdd and the aic controllers have been kinda messed up. also there may have still been some SMP problems that early on too.
also if you compile a kernel up then you can set some things related with the scsi controller. the idea is to speed it up on fast hdds but some hdds don't like it and cause your exact problem.
you can set these levels at boot but i don't think it's what your lilo edits are doing. sorry i can't remember exactly what this setting is but if you compile the kernel you'll see immediately. i think the aic code gets better from 2.4.5 onwards but i use 2.4.1 on my desktop with no problems.
i've never had any problems with aic on 2.2 but maybe SUSE does something funny.

hth
 
I also use an aha adapter (with SuSE 6.4) with the aic7xxx driver, and an older
single processor machine(Athlon 550) and single scsi drive. I have recently encountered problems with kernel paging oops at various virt mem addresses with
system freezes resulting. The causes are obscure and are suspicious;
(corrupted PAM_passwd.so, system dies on inetd or mingetty-kernel oops called
from these processes) I would love to believe that my problems are the result of
a temperamental piece of hardware and a so-so driver.

Your problem definitely does not look like hardware, so I think you can stop
looking there..The scsi drive is not responding, either due to the timeout setting, as
the SuSE guys say, or due to the driver/hardware combo. If you can code you can look
at the scsi-programming howto and talk to the drivers writer-maybe the SuSE guys
can help you out there. But that is a long term project...You have done the admin end stuff..The only possible "solution" I can think of is a supported raid controller, and that
may complicate things, rather than solve your problem..

If you have any ideas on my problem feel free to write back.
 
Thanks to all and sorry for the bundled up printouts.

One question; How do I find out what driver version I am using??

Regarding Mr.Tom, no th eproblem is both with 2.2.18 and 2.4.0, so not just a 'buggy' 2.4.0. I also had no problems for 2 years now with 2.2.10 ad aic7xxx(older one)
According to Suse the problems start with their 6.4 version and further on.

Regarding marsd problems, it might be interesting to see
if th eproblem persists with a downgraded kernel, like Suse 6.2

svar
 
Hi,

Assuming you have the kernel source installed, the easiest way to find out wich driver you have already is to look at the source in :

/usr/src/linux/drivers/scsi/aic7xxx

(assuming /usr/src/linux is a symlink to your actual kernel source tree).

Whether a different off-the-shelf kernel (vs kernel modules) would make a difference all depends on how Suse ship the compiled kernels. For example, they may include some scsi code within the kernel or they may just compile everything as separate loadable modules. If in doubt just compile and install your own kernel & modules.

Rgds
 
Thanks. However, all drivers in the link say version 6.something while mine says 1.something
i.e.
both in /usr/src/linux/drivers/scsi/aic7xxx
and in /usr/src/linux-2.4.0.Suse/drivers/scsi/aic7xxx
I look at aic7xxx.seq and reg

They mention versions 1.4, 1.77 and 4.1;
how do I get hold of and install a better version?


* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $Id: aic7xxx.c,v 1.119 1997/06/27 19:39:18 gibbs Exp $
*---------------------------------------------------------------------------
*
* Thanks also go to (in alphabetical order) the following:
*
* Rory Bolt - Sequencer bug fixes
* Jay Estabrook - Initial DEC Alpha support
* Doug Ledford - Much needed abort/reset bug fixes
* Kai Makisara - DMAing of SCBs
*
* A Boot time option was also added for not resetting the scsi bus.
*
* Form: aic7xxx=extended
* aic7xxx=no_reset
* aic7xxx=ultra
* aic7xxx=irq_trigger:[0,1] # 0 edge, 1 level
* aic7xxx=verbose
*
* Daniel M. Eischen, deischen@iworks.InterWorks.org, 1/23/97
*
* $Id: aic7xxx.c,v 4.1 1997/06/12 08:23:42 deang Exp $
*-M*************************************************************************/

/*+M**************************************************************************
*
* Further driver modifications made by Doug Ledford <dledford@redhat.com>
*
* Copyright (c) 1997-1999 Doug Ledford
*
* These changes are released under the same licensing terms as the FreeBSD
* driver written by Justin Gibbs. Please see his Copyright notice above
* for the exact terms and conditions co


------------------in


F THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $Id: aic7xxx.reg,v 1.4 1997/06/27 19:38:39 gibbs Exp $
*/
F THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $Id: aic7xxx.reg,v 1.4 1997/06/27 19:38:39 gibbs Exp $
*/


$Id: aic7xxx.seq,v 1.77 1998/06/28 02:58:57 gibbs Exp $


$Id: aic7xxx.seq,v 1.77 1998/06/28 02:58:57 gibbs Exp $
*/

Same situation gor 2.2.18
$Id: aic7xxx.seq,v 1.77 1998/06/28 02:58:57 gibbs Exp $
*/ o
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top