Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ANS1029E - Communication with the TSM Server is lost 1

Status
Not open for further replies.

EGR

Technical User
Feb 18, 2002
159
0
0
DE
Hi,

we have got a TSM 5.3 Server running on Windows Server 2003.
We are backing up four Solaris 9 servers.
All of them are configured in the same way, options in dsm.sys and dsm.op are the same.

For several days now two of the nodes are not backed up anymore by scheduled jobs.
In dsmsched.log and dsmerror.log following entries can be found:

"ANS1029E - Communication with the TSM Server is lost"
In actlog one can find this:
"ANR0403I Session 380 ended for node XXXX (). (SESSION: 75)"

When I restart the script /etc/init.d/tsm the node contacts the server and retrieves the schedule, it seems like that but after that it again says:

ANS1029E - Communication with the TSM Server is lost

Help would be really appreciated.

Thanks
/egr

 
Hi,

I'm not quite sure if this will solve your problem, but here's what I found on the internet dealing with this issue:


--------------------------------------------------------

"I am having the same problem on one of my servers. Seems the authentication is not working for the cluster node. Once I authenticate I can run a backup. I did set this up with the dsmcutil command. Not sure what else I need to look at."

"Did you install the scheduler with DSMCUTIL...if you didnt, you will need to reference the "Installing the Windows Clinet" manual. Chapter 12 deals with installing the scheduler service along with the Web clinet"

--------------------------------------------------------

"You'll need to use the 4.1.1 client, there is an apar out for this issue on
the 4.1.2 client."
---------------------------------------------------------

Regards
Thomas
 
Just curious if you've made progress with your problem? I currently have an open support issue with IBM on this problem, but they have not yet been able to resolve it. Thank you.
 
Hi chaz,

we still got the problem.
Nothing new.

It is very annoying.
This TSM BA Client is getting on my nerves.

Our network is working fine.

Regards
/egr
 
well lets see.. There must be more info someplace.

ANS1029E indicates to me that the TSM server is closing the connection on the baclient.

is there any more info before/after that error?
Is there any more info in the actlog on the tsm server before/after the error?
Can you do a backup and restore manually and is this problem exclusively for scheduled events?
 
Lastly, what verison of the client and server are you running?

TSM 5.3.0?
What client?
 
Hi,

We got the similar problem.
we have a TSM 5.2.6 Server running on AIX 52-04
Need to back up a AIX 5.3-02 server running TSM 5.2.4 client

After the first successful running, the node are not backed up anymore by scheduled job. I manuly ran "incremental", but everytime after the first filesystem (/) has been backed up, the connection with server was lost.

In Server activity log following entries can be found:
"ANR0480W Session 1741 for node Beta (AIX) terminated - connection with client severed. (SESSION: 1741)"

In client dsm, found:
"ANS1809W session is lost; initializing session reopen procedure"

Our network is working fine.

Thanks for help.

Regards


 
Hi,

thank you very much for your help, all!

is there any more info before/after that error?

No, just this:

"Querying server for next scheduled event.
10/18/05 02:32:50 ANS1029E Communication with the TSM server is lost.
10/18/05 02:32:50 Will attempt to get schedule from server again in 5 minutes.
10/18/05 02:32:50"
Is there any more info in the actlog on the tsm server before/after the error?
We had an actlogretention of 1 day, I changed it and will post the result later.

Can you do a backup and restore manually and is this problem exclusively for scheduled events?

I can backup and restore without problems.

The other problem is that I get warnings when I check my schedlued events with Operational Reporting - MMC but I don not get any details on theses warning.
But I have started another thraed for that.
May the connection losts be the reason for the warnings?

Thanks in advance.
/egr
 
Hi,

I have got something to add:
I am not that familira with that Unix client and I did not install it either.

I have seen that these entries even appear during the day when there is no event scheduled.

I know that on Windows BA Clients there is a Client Acceptor daemon that starts the scheduler.
It seems that the scheduler on that Solaris client runs all the time.

Is that ok?

But that cannot be the reason for the lost connections, can it?

Regards
/egr
 
We had the same problem. What I did to fix it was not to have the tsm client sched run the jobs but tsm client cad run them. By changing to cad running the job it would first connect to tsm server and get the backup schedules for the system. Then at the time in the sched it would connect to tsm and tell it to backup the systems. The other way sched would poll the server every once in a while to see if the job should run now or not. Some where in that process it would lose connect to the server. This other way the connection is set up only when needed. I didn't have this problem when every thing was 5.2 client and server I only started having this problem when it was a 5.3 client and 5.2 sever. I was told by IBM once everything is at the same level 5.3 everything should work like it did before and I can stop using cad to run the jobs.
 
For what it's worth, here is what I would suggest since we have multiple cases here that are similar but I don't know what you've all tried or not tried.

1. confirm network is working properly. While this is probably unlikely the cause, it's always a good idea to FTP a relatively large file (100MB+) between the client and TSM server and confirm throughput is on par with expectations and no disruption occurs. Often this identifies problems with network setup causing slow backups or other issues and is great because it takes TSM out of the loop as a contributing factor to the problem. Think about your network setup. Are you going through a firewall complicating matters? Does your network group configure the switches to kill idle connections after some time? Is there any chance that is a contributing factor? usually the answer is no because TSM simply re-establishes connections so the FTP test is a pretty solid network test.

2. sessions being interrupted usually occur either because the baclient is crashing, the TSM server is closing the connection, or neither occurs (software bug, os patch/library incompatibility problem, or network problem).

3. If the TSM server is closing the connection, it will usually say why in the activity log before/after the actual error message closing the connection. E.g. maybe the storage pool fills up and there is no space for the backup. Or there are no drives available. etc etc..

4. if the baclient is closing the connection, it will usually give some indication as to why in dsmsched.log of dsmerror.log around the timestamp of the occurance. If it doesn't, it may simply be crashing. Check the versions you're running OS and baclient wise and make sure they're compatible. *TRY* several different baclient versions (roll back to x.0 or forward to another version). Another good point made above is you can dry running dsmsched or dsmcad. dsmcad tends to be the preferred way to do things these days but that's not to say you can't try the other route. I have definetly seen certain servers not getting along with the baclient. It usually boils down to an OS patch or library difference on that specific server that your other behaving servers don't have. If this is the case and you're certain you should be compatible, you have no recourse other then to push IBM to escalate your support call and get a hotfix made.

5. IBM. I hate to say it, but your ability to open a support ticket and escalate past level 1 and get someone to take action on your problem is proportional to your experience calling and getting those types of things done. Your job is to make them see this is a critical problem as soon as possible and to follow up regularly. If you are dealing with a software bug or compatibility issue you will get a resolution eventually but keep in mind it isn't easy for them either to reproduce the problem and provide a fix. On the other hand they do have the ability to increase debug levels on TSM and the client to figure out what's really going on. The sooner you open the call the sooner you will get some results so you should've at least started a call long before doing any troubleshooting yourself. It also helps them a great deal if they're working with someone knowledgable with TSM (as apposed to telling you basics like how to log into dsmadmc).

6. subscribe to and post to the ADSM-L listserv. It doesn't hurt to post your problem/question there while you also work with IBM and troubleshooting yourself. The responses you get and value of this will be directly proportional to how effectively you compose your question, provide log files or output, and give the folks on the list something to work with. This is by far the single greatest resource of knowledgable TSM administrators available to you. It's not unusual to get a response quicker then IBM. Post your question, call IBM, and *then* start troubleshooting so you're solving the problem by pursuing multiple paths.

Hope this helps..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top