Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ESS 8700 Servers Restart every night 1

Status
Not open for further replies.

RFWatts

MIS
May 8, 2003
211
0
0
US
I have a new ESS site with two 8700 servers. I noticed in the logs that the ESS server restarts every night at 12:04am and again at 02:04am. The SAVE TRANS ESS is scheduled at 02:00am on the main server, so that might explain the 02:04 restart, but it doesn't explain the 12:04 restart. Has anyone experienced this in ESS/LSP environments? Is this normal?

Thank you,

RFWatts
 
I have this same thing happening with two S8500 ESS locations. I am bringing it up with Avaya in a CC today. Will let you know what they tell me.
 
Dunstan

I've done some more digging in the syslog, and I've determined the restarts occur after translation filesyncs. I suspect this is Avaya's method of forcing those servers to have the latest translations active. Going back to the pre-server days (v6/7/8/9), this would be similar to having to do a reset sys in order to read the translations from a swapped out translation card.

I'm still a little puzzled about why it restarts twice. I see the correlation with one of the restarts, and the maintenance-hour translation save on the main server, but I don't understand what is invoking the other.

If you look in your syslog on you ESS, it should show most of the activity. Unfortunately, mine do not indicate what host sent the translation files via rsync, so I can only assume it came from the main server. Please pass on any information you find.

Thank you,

RFWatts
 
I have the same problem and would like to know what the outcome of this is.
 
I've been doing some more testing, and this is definately a result of translations being pushed to the ESS servers. To test this, invoke a "save trans ess" from the main server. Then wait about 3-4 minutes, and check the syslog/restart logs on the ESS. I have done this a few times now, and the ESS servers will always restart after they receive new translations.

Now the question is whether or not this is "working as designed", or a new bug.

RFWatts
 
Here is what I was told

The restarts are part of normal operations in the ESS environment.

You were/are on the right track according to what was said the translations are downloaded but stored in tmp files of a sort and not actually synced until some time period afterwords when they actually sync up the files.

That will cause CM to restart.

It is and I repeat is "working as designed"
 
Thanks Dunstan, you get a star!

I was afraid that they might say that...with that said, it's my hope to see the ESS product evolve to allow uninterrupted failover/interchange, and load-sharing between servers (ESS and main). Give them a few years, and that product will probably evolve to that point. In the mean time, I guess we just have to live with the restarts, and service-affecting failovers.

Thank you,

RFWatts
 
I agree, if you look at the issues surrounding control network C, lack of a changeable VLAN ID on that interface and the klugie method by which the ESS servers synch I can only imagine that it will evolve.

I can say that I have been impressed with how well ESS has actually worked when I have experienced an outage.

 
From my testing, I found that during a translation save to an ESS, the main server sends the following files:

group.sat (group permission data)
passwd.sat (password file for administered users)
xln1_dif.txt.gz (tarball'd translation file...this probably isn't all the trans, just ones changed since last update)
xln1_last_time.txt (timestamp of the last translation save)

What happens next on the ESS is still a little sketchy, but it looks like the ESS server copys it's existing translations from file xln1 to file xln2, and then merges the differential translation data recieved with the existing translations, and that is then saved as file xln1. The ESS server then invokes a reset sys (level 4...I think) so that the new translations are activated.

I'm not sure if these task are invoked from a script, or if it is an executable thread running in background. Perhaps someone who is Linux-saavy can dig into the inner-workings of that piece. These files might help understand what to expect from an ESS both in Standby and in a failed situation, and also affects from midnight routines.

It does look like the ESS checks to see if it is active before rebooting, but I'm not completely certain. There needs to be a good white-paper on exactly what happens (behind the scenes) between a main server and an ESS server. This might clear up some of this, and set a service-expectation level while in a fail-over mode.

Just thought I'd share what I found with the forum...

RFWatts
 
Dunstan,

if i have the right piece of information, "s8500 as lsp" feature should be available in cm3.1 that should come out this april. i hope it's true.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top