Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sendmail keeps hanging

Status
Not open for further replies.

Stinney

IS-IT--Management
Nov 29, 2004
2,028
0
36
US

For some reason, sendmail is hanging. If I restart it, all of the mail that's queued will send, but then at no specific time or noticible event, it hangs. I have to kill the process and stop and start it again.

I barely figured out how to set it up, so I'm at a loss as to where to start looking.

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
No logfile by that name in that directory. The syslog shows the following. The error Connection refused by 127.0.0.1 is what I get and I have to kill the ps and restart sendmail to get it to work:


Apr 7 11:19:44 *name removed* sendmail[26238]: [ID 801593 mail.info] m37FJiFY026238: from=cms, size=81, class=0, nrcpts=1, msgid=<200804071519.m37FJiFY026238@*name removed*.
tempdomain.org>, relay=root@localhost

Apr 7 11:19:44 *name removed* sendmail[26238]: [ID 801593 mail.info] m37FJiFY026238: to=*name removed*@*name removed*.com, ctladdr=cms (101/200), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=30081, relay=[127.0.0.1] [127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
What OS, what version of sendmail.

Also, when it is running, what does the ps (with arguments) listing look like? It is seems the daemon isn't bound to the localhost address.

 

SunOS r3fvmnh 5.9

Sendmail 8.13.6+Sun/8.13.6

ps -ef | grep send
root 4842 225 0 14:28:39 pts/11 0:00 grep send
root 4777 1 0 14:27:33 ? 0:00 /usr/lib/sendmail -bd -q15m
root 4783 4777 0 14:27:38 ? 0:00 /usr/lib/sendmail -bd -q15m
smmsp 4780 1 0 14:27:33 ? 0:00 /usr/lib/sendmail -Ac -q15m

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
First, you might want to reconfigure /etc/syslog.conf to send mail.debug to a file.

Check your /var/spool space. Check the load average with uptime, anything greater than 8 will make sendmail refuse connections.
 
elgrandeperro,

I'm not UNIX tech by trade. The system is primarily used for one application and not for email, in fact no one uses it for email. I'm using it to try and set up some system alerts in scripts to email me when there are problems.

Can you tell me how I would check the /var/spool space? It apparently is not an executable file, or one that I can cat.

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 

OK, brain far:

/var/spool is a directory, but what should I be doing in the directory. The folloing files are there:

drwxrwx--- 2 smmsp smmsp 1024 Apr 7 14:27 clientmqueue
drwxr-xr-x 4 root sys 512 May 19 2004 cron
drwxr-xr-x 2 uucp uucp 512 Apr 3 17:06 locks
drwxrwxr-x 7 lp lp 512 Feb 12 02:13 lp
drwxr-x--- 2 root bin 512 Apr 7 13:00 mqueue
drwxrwxrwt 2 root bin 512 Mar 3 2004 pkg
drwxr-xr-x 2 root lp 512 May 19 2004 print
drwxr-xr-x 8 uucp uucp 512 Jan 23 2007 uucp
drwxr-xr-x 3 root other 512 May 19 2005 uucppublic

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 

When you get connection refused, is sendmail still running?
(i meand the sendmail with -bd which is daemon mode).

Another way to tell is to see it listening on port 25:

netstat -an | grep LIST | grep 25

df (given a path) will give the stats for the filesystem on that path. So to see /var/spool (regardless of how it is mounted) do:

df -k /var/spool

(if it is mounted in /, /var, or /var/spool it will give the right device)

now go "uptime" to see what the load average is.(the last entries are the load average.)

The values in sendmail.cf that disable queuing for load average:

# load average at which we just queue messages
#O QueueLA=8

# load average at which we refuse connections
#O RefuseLA=12

# load average at which we delay connections; 0 means no limit
#O DelayLA=0


Lastly, to change syslog, I usually change this line in /etc/syslog.conf to get debug logging:

from:
mail.debug ifdef(`LOGHOST', /var/log/syslog, @loghost)
to:
mail.debug /var/log/mail.debug

then as root:

touch /var/log/mail.debug (probably unneeded)
then send a SIGHUP using kill to syslogd's pid to have it
reread its configuration.

Once you do that, it should write logs to /var/log/mail.debug once you stop/start sendmail.

Are you using the vanilla "mailhost" configuration? (it forwards all mail to the host mailhost?




 
When sendmail stops working this is all ps -ef | grep send returns:

smmsp 6147 1 0 15:00:02 ? 0:00 /usr/lib/sendmail -Ac -q15m


df -k /var/spool returned:

Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d1 4130982 1501793 2587880 37% /



uptime returned:

4:27pm up 55 day(s), 13:15, 68 users, load average: 2.05, 2.38, 2.43


sendmail.cf was commented out like you posted.

Changed the syslog.conf as you posted.

Don't know what you ment by sending a SIGHUP, to kill syslogd. I killed the pid for syslogd and executed it again, but mail.debug didn't log anything when I stopped and started sendmail.




- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
Okay, unless you have multiple CPUs that load average is pretty high, but most likely is not the cause.

Can't be disk space.

What is LogLevel set in sendmail.cf ? I think by default it is 9.

Stop and start syslogd from /etc/rc2.d/S74syslog (stop/start) and the same with sendmail /etc/rc2.d/S88sendmail stop or start.

Without the logging, it is going to be hard to figure out what is happening.

 

We do have a high load on the CPU due to the number of users logged in and the amount of load caused by the reports they are running.

We have plenty of disk space.

LogLevel is 9

I stopped and started S74syslog, but nothing reports in the mail.debug file.

I've been using sendmail stop/start in the /etc/init.d directory, not the S88sendmail you refer to. I made sure that all of the sendmail processes were stopped or killed and started S88sendmail and was able to send an email. I'll keep an eye on it and see if it hangs again.

I started and stopped S88sendmail but nothing wrote to the mail.debug. Should it? Or does it only write when there is an error encountered?

Thanks for your continued help.

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 

Crashed and burned again. [mad]

Nothing in the mail.debug file, completely empty.

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
This is not a cluster or Solaris zoze (I know you said S9).

The only way to figure this out is to get the log to work.

You can use "logger" to send the equivalent messages like this:

logger -p mail.debug "this is a test"

This should appear in /var/log/mail.debug. If it does not, there is something wrong.

 

Tried the logger command, but got nothing. Stopped and started S74syslog again and the logger command worked. I'll let you know what it says if the mail crashes again.

Thanks!

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
OK so the mail.debug worked and captured when mail failed. However, when I did a cat of the mail.debug file all of the queued mail sent, I didn't do anything to stop/start sendmail or kill the sendmail ps, it just sent.

Here is part of the mail.debug file (it's too long to put all of it here) let me know if you need more, but it looks like the same, just repeating over and over.

Apr 8 13:14:20 *hostname removed* cms: [ID 702911 mail.debug] this is a test

Apr 8 13:15:01 *hostname removed* sendmail[6223]: [ID 801593 mail.info] m38HA1gP006108: to=*email address removed*, ctladdr=cms (101/200), delay=00:05:00, xdelay=00:00:00, mailer=relay, pri=120080, relay=[127.0.0.1] [127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

Apr 8 13:16:26 *hostname removed* sendmail[6422]: [ID 801593 mail.info] m38HGQKL006422: from=cms, size=88, class=0, nrcpts=1, msgid=<200804081716.m38HGQKL006422@*hostname removed*.t
empdomain.org>, relay=cms@localhost

Apr 8 13:16:26 *hostname removed* sendmail[6422]: [ID 801593 mail.info] m38HGQKL006422: to=*email address removed*, ctladdr=cms (101/200), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=30088, relay=[127.0.0.1] [127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

Apr 8 13:17:15 *hostname removed* sendmail[6450]: [ID 702911 mail.info] starting daemon (8.13.6+Sun): SMTP+queueing@00:15:00

Apr 8 13:17:15 *hostname removed* sendmail[6451]: [ID 702911 mail.info] starting daemon (8.13.6+Sun): queueing@00:15:00

Apr 8 13:17:15 *hostname removed* sendmail[6453]: [ID 801593 mail.info] m38HGQKL006422: to=*email address removed*, ctladdr=cms (101/200), delay=00:00:49, xdelay=00:00:00, mailer=relay, pri=120088, relay=[127.0.0.1] [127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]


- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 

Interesting to note that although the mail in queue sent when I did a cat on the mail.debug file, I can't send mail manually from the command line. It gives me the same connection refused by 127.0.0.1 error. Had to stop, kill ps and start again.

- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
Are you running the JASS security version?

If I read the debug log, sendmail starts, then it can't connect to localhost, and when you look you don't see the "sendmail -bd" (daemon mode). That means sendmail has died suddenly without logging anything? Nothing in /var/adm/messages either?
 
elgrandeperro,

I have no idea if we're running the JASS security version. I'm so sorry that I have no clue here.

I set up a cron to send out an email once every 10 minutes to monitor the sendmail. It stopped sending and then at 5:01 I received 6 emails all at once. *While I was writing this, the same thing happened at 6:01pm.

You can see in the mail.debug where all of a sudden it starts sending again:

Apr 8 15:00:00 *hostname removed* sendmail[11614]: [ID 702911 mail.info] starting daemon (8.13.6+Sun): SMTP+queueing@00:15:00

Apr 8 15:00:01 *hostname removed* sendmail[11622]: [ID 702911 mail.info] starting daemon (8.13.6+Sun): queueing@00:15:00

Apr 8 15:00:01 *hostname removed* sendmail[11628]: [ID 801593 mail.info] m38J01Ui011628: from=<cms@*hostname removed*.tempdomain.org>, size=379, class=0, nrcpts=1, msgid=<200804081850.m38Io0Z9011312@*hostname removed*.tempdomain.org>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1]

Apr 8 15:00:01 *hostname removed*sendmail[11626]: [ID 801593 mail.info] m38Io0Z9011312: to=*username removed*@*hostname removed*.com, ctladdr=cms (101/200), delay=00:10:01, xdelay=00:00:00, mailer=relay, pri=120080, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (m38J01Ui011628 Message accepted for delivery)

Apr 8 15:00:01 *hostname removed*sendmail[11643]: [ID 801593 mail.info] m38J01Ui011628: to=<*username removed*@*hostname removed*.com>, ctladdr=<cms@*hostname removed*.tempdomain.org> (101/200), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=120379, relay=*servername removed*.*hostname removed*.com. [10.100.25.18], dsn=2.0.0, stat=Sent (<200804081850.m38Io0Z9011312@*hostname removed*.tempdomain.org> Queued mail for delivery)

Apr 8 15:00:02 *hostname removed*sendmail[11628]: [ID 801593 mail.info] m38J01Uk011628: from=<cms@*hostname removed*.tempdomain.org>, size=379, class=0, nrcpts=1, msgid=<200804081840.m38Ie17q010875@*hostname removed*.tempdomain.org>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1]


The messages log had this:

Apr 8 13:00:16 *hostname removed* sendmail[5029]: [ID 702911 mail.alert] daemon MTA-v4: problem creating SMTP socket

The a lot of telnet[####] messages such as:

Apr 8 13:02:03 *hostname removed* telnetd[5674]: [ID 484914 daemon.notice] gethostbyaddr:
satrmt23232.*name removed*.com. != 10.X.X.X

Apr 8 14:00:01 *hostname removed* sendmail[8380]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root): opendaemonsocket: daemon MTA-v4: cannot bind: Address already in use



- Stinney

Favorite all too common vendor responses: "We've never seen this issue before." AND "No one's ever wanted to use it like that before.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top