ESS 8700 Servers Restart every night 1

RFWatts · Feb 1, 2006

I have a new ESS site with two 8700 servers. I noticed in the logs that the ESS server restarts every night at 12:04am and again at 02:04am. The SAVE TRANS ESS is scheduled at 02:00am on the main server, so that might explain the 02:04 restart, but it doesn't explain the 12:04 restart. Has anyone experienced this in ESS/LSP environments? Is this normal?

Thank you,

RFWatts

Dunstan · Feb 2, 2006

I have this same thing happening with two S8500 ESS locations. I am bringing it up with Avaya in a CC today. Will let you know what they tell me.

RFWatts · Feb 2, 2006

Dunstan

I've done some more digging in the syslog, and I've determined the restarts occur after translation filesyncs. I suspect this is Avaya's method of forcing those servers to have the latest translations active. Going back to the pre-server days (v6/7/8/9), this would be similar to having to do a reset sys in order to read the translations from a swapped out translation card.

I'm still a little puzzled about why it restarts twice. I see the correlation with one of the restarts, and the maintenance-hour translation save on the main server, but I don't understand what is invoking the other.

If you look in your syslog on you ESS, it should show most of the activity. Unfortunately, mine do not indicate what host sent the translation files via rsync, so I can only assume it came from the main server. Please pass on any information you find.

Thank you,

RFWatts

mrjedi · Feb 2, 2006

I have the same problem and would like to know what the outcome of this is.

RFWatts · Feb 2, 2006

I've been doing some more testing, and this is definately a result of translations being pushed to the ESS servers. To test this, invoke a "save trans ess" from the main server. Then wait about 3-4 minutes, and check the syslog/restart logs on the ESS. I have done this a few times now, and the ESS servers will always restart after they receive new translations.

Now the question is whether or not this is "working as designed", or a new bug.

RFWatts

Dunstan · Feb 2, 2006

Here is what I was told

The restarts are part of normal operations in the ESS environment.

You were/are on the right track according to what was said the translations are downloaded but stored in tmp files of a sort and not actually synced until some time period afterwords when they actually sync up the files.

That will cause CM to restart.

It is and I repeat is "working as designed"

RFWatts · Feb 3, 2006

Thanks Dunstan, you get a star!

I was afraid that they might say that...with that said, it's my hope to see the ESS product evolve to allow uninterrupted failover/interchange, and load-sharing between servers (ESS and main). Give them a few years, and that product will probably evolve to that point. In the mean time, I guess we just have to live with the restarts, and service-affecting failovers.

Thank you,

RFWatts

Dunstan · Feb 3, 2006

I agree, if you look at the issues surrounding control network C, lack of a changeable VLAN ID on that interface and the klugie method by which the ESS servers synch I can only imagine that it will evolve.

I can say that I have been impressed with how well ESS has actually worked when I have experienced an outage.

RFWatts · Feb 3, 2006

From my testing, I found that during a translation save to an ESS, the main server sends the following files:

group.sat (group permission data)
passwd.sat (password file for administered users)
xln1_dif.txt.gz (tarball'd translation file...this probably isn't all the trans, just ones changed since last update)
xln1_last_time.txt (timestamp of the last translation save)

What happens next on the ESS is still a little sketchy, but it looks like the ESS server copys it's existing translations from file xln1 to file xln2, and then merges the differential translation data recieved with the existing translations, and that is then saved as file xln1. The ESS server then invokes a reset sys (level 4...I think) so that the new translations are activated.

I'm not sure if these task are invoked from a script, or if it is an executable thread running in background. Perhaps someone who is Linux-saavy can dig into the inner-workings of that piece. These files might help understand what to expect from an ESS both in Standby and in a failed situation, and also affects from midnight routines.

It does look like the ESS checks to see if it is active before rebooting, but I'm not completely certain. There needs to be a good white-paper on exactly what happens (behind the scenes) between a main server and an ESS server. This might clear up some of this, and set a service-expectation level while in a fail-over mode.

Just thought I'd share what I found with the forum...

RFWatts

nohuhu · Feb 4, 2006

Dunstan,

if i have the right piece of information, "s8500 as lsp" feature should be available in cm3.1 that should come out this april. i hope it's true.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

ESS 8700 Servers Restart every night 1

RFWatts

MIS

Dunstan

MIS

RFWatts

MIS

mrjedi

Vendor

RFWatts

MIS

Dunstan

MIS

RFWatts

MIS

Dunstan

MIS

RFWatts

MIS

nohuhu

Technical User

Similar threads

Part and Inventory Search

Sponsor