Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OpsMan under E-Mgr Needs Periodic Reboot

Status
Not open for further replies.

MitelInMyBlood

Technical User
Apr 14, 2005
1,990
0
0
US
I think we talked about this once before, but can't find the thread now.

Am I the only one having to periodically reboot my EMgr server because OpsMan is locking up? Seems like once every 2~3 weeks OPS will simply stop working and won't let you log in. The initial java screen pops up but the client won't start. Eventually it times out w/an error msg to the extent that it says OpsMgr is not running.

I've checked the server logs. There's nothing there. No errors. I've checked the services, everthing's running fine. No problems in EMgr either, all sites access OK, only the OpsMan blade is hung.

The only fix seems to be a server reboot. :( :( :(

My TAM tells me there's "a couple" tickets open on this, but does anyone know of a fix? Doesn't happen often, but seems to bite us when we're busiest and swampped with MACs and unable to devote time to waiting in the hold queue and capturing traces. I realize that's little help to prod. supp. but we don't notice it's locked up until we need it and at that point it's a crisis because midday moves are underway. We've got to boot it and move on.

A related symptom is that *sometimes* when OM locks up the server will complain that the SAM is missing. This does not occur coincidental to every OM lockup, but maybe a third of the time. Most times a server reboot will get us back in operation. On those occasions when it can't see the SAM we have to unplug and re-plug the SAM, then reboot.

There does not seem to be any causitive action, such as periods of high MAC activity versus low. It can be working fine in the morning and be locked up after lunch for no apparent reason and nothing in the server logs to indicate a cause.

Any ideas?
 
Possible clue here I neglected to mention.

MAC activity does seem to become progressively slower or sluggish a day or so before and leading up to the lockup.

Memory leak????????????
 
We had the same problem here at a client in Calgary. We ended up having to uninstall the entire product and then reinstalling. Performing an upgrade or changing the sentinal did not help. I am not sure if we had to rebuild the database as I another tech completed the fix. It has been running fine for a month and half now.
 

Certainly not what I wanted to hear, but thanks for the info.

 
When the thing dies can you check a mysql service status?
In the package there is command line mysql client which you can use for testing purposes. OPSMan process uses ODBC driver for communications to the database. And you probably know that MySQL ODBC driver for Windows is not the most stable product. I would take a look at this area.
 

I'll try to remember to check that next time it locks up. Thanks
 
I've been chasing the same problem for over a month now. Mitel is trying to get remote access to the system for analysis which the customer has been dragging their heels on.

My system uses excessive memory and threads.

Daily reboots are keeping the problem in check.

I am sorely tempted to upgrade to 3.1 to see what happens.

There is an island of opportunity in the middle of every difficulty.
Miss that though, and you're pretty much doomed!
 
I've upgraded my OPSMan few months ago and so far any issues.
 
The problem continues in EM 3.1/OM 7.5 as well. It was in 2.1/7.0 as well

The cust is moving the server into a different domain this weekend. As part of that the server Gods are putting a weekly autoboot schedule on the box. Since the lockup occurs only ever 2~3 weeks or so a weekly reboot will prolly be as good a getaround as anything. I'm all for periodic reboots of everything Microsoft anyway.

In re to remote access, in corporations big enough to have their own internal network security group don't be surprised if you need dispensation from God to get a vendor remote access to a server inside their firewall.

I'm beginning to understand Cisco's decision to migrate their CM over to Linux.
 
I've run across the same problem, glad to know it's not just me. The only thing I've found so far is that running MOM (Microsoft Operations Manager) tends to make the problem even worse so that you have to restart the OpsMan JVM service every 2-3 days instead of weeks. Other than that, I'm just limping along on the problem like y'all are, rebooting when necessary.

Another issue I've come across is the fact that E-Man/OpsMan do NOT support running on Windows Server 2003 R2. Which I find interesting considering I had 2 servers running 3.0/7.4, and one still runs with 3.1/7.5. The good folks in Kanata should be looking at my servers today, hopefully to find some clues as to what in particular I have configured that's allowing these apps to run on an O/S that they claim they have NEVER been able to get to work in the lab.
 
Gee I dunno, is this lockup thing at all related to specific OS builds? My client requires a custom build of W2K3 which they do in house as part of something thrust upon them by the network security freaks. These guys are really anal about several things in the OS which they disable as part of their exercising complete and absolute control. Even when security patches come out the network security folks on occasion cripple a feature here and there. I understand the concern, these are servers on a specific server-only network so any breach could cause a lot of collateral damage, but they sometimes make it hard for the applications folks to make things work.
 
There is a rebooter built into the ops server just do a search for reboot and you will find a reboot scheduler from Mitel. This goes back to the days when it had to be rebooted weekly.
 
We have had several issues in the past with EM and OM on a Windows 2003 server and after weeks of working with Mitel found that there seems to be problems with the way the system was initially installed. When we installed EM/OM we used a second partition (D:) as the "install to" drive for EM but OM only allowed installation on C: which we normally reserve for the server OS and related apps only. We completly reloaded Windows 2003 with the drive partitioned as only C: and re-installed everything to the same drive which eliminated most of our issues.

I hope this helps someone...

Dale

Dale
 
Hello,

The problem with your OPS-Manager likes a problem that is introduced when you make a connection with remote desktop and then startup the OPS-Manager application.

This way of connection gives a error with the USB-dongle.

Solution without rebooting OPS-Manager.

Close the Remote desktop connection, startup OPS-manager application direct on the server, and close this session again.

Now you are able to setup a webserver connection from any client.

Tip: dont't use remote desktop to use the OPS-Manager application.
Only for viewing the event viewer on the server.

Succes,

Fred
 
I had an issue where autodiscovery was trying to discover to many hosts (Class B) that locked up the application.

By norrowing the scope of my discovery requests the problem has now disappeared.

Dont know if this aplies to you but ...

There is an island of opportunity in the middle of every difficulty.
Miss that though, and you're pretty much doomed!
 
I was having the same issue when we first set everything up but quickly found that if I manually entered the sites themselves and left the "Network" and "Node" discovery sections empty then autodiscovery would find everything when it does it's re-discovery process. It was a bit of a pain to start with (we have 234 sites) but made things cleaner and a bit quicker in the long run.


Dale
 
I started using the reboot utility to do weekly server reboots since the week before xmas and have had zero problems since with the OM client locking up. Only Ops Man was locking up, the EM server and seemingly all other EM functions still worked fine. Its not fixed but is as far as I'm concerned. Seems everything on a Microsoft platform just needs a swift kick in the backside from time to time.
 
yup have the same lockup issue on OPS 7.5 Ent 3.1.

getting dongle errors in the event viewer but only at times. The locking up though occurs frequently and only a reboot solves the problem.

I can say that the server I was given to install on does have more than one partition allthough I did install on C:\

Has anyone raised this with 3rd line. Has a DPAR been raised?

E
 
Hi I have had this problem with 2003 server.Fix was a complete rebuild including the Server softwsre.This time I made sure IIS was installed before the entman and opsman software was installed.
Been running perfect now for two months.No reboots.
 
Look like I'm the only lucky one. Any single reboot or problem for 6 months sinse upgraded.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top