Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Recurring Low VM alarms CUCM 8.5

Status
Not open for further replies.

MitelInMyBlood

Technical User
Apr 14, 2005
1,990
US
I've raised the low VM alarm threshold in RTMT to 80% but still hitting it (80%) at approx 14-day intervals and having to reboot. The server (7835) is configured with only 4 g of memory. Will adding more memory help abate this issue?

Any thoughts on how much memory to add?

Any thoughts on spending the big bux for Cisco memory vs buying it from IBM (or Crucial, etc)?

Original MUG/NAMU Charter Member
 
Addt'l:
Code:
login as: admin
admin@10.241.0.100's password:
Command Line Interface is starting up, please wait ...

   Welcome to the Platform Command Line Interface

admin:show process load memory
top - 14:27:07 up 14 days, 20:34,  1 user,  load average: 0.59, 0.61, 0.62
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.1% us,  1.8% sy,  0.0% ni, 88.1% id,  2.0% wa,  0.0% hi,  0.0% si
Mem:   4127984k total,  4105348k used,    22636k free,     4480k buffers
Swap:  2048248k total,  1356492k used,   691756k free,   539980k cached
  PID USER      PR  NI  RES  SHR S %CPU %MEM   TIME COMMAND
13950 ccmbase   15   0 1.4g  58m S  0.0 36.7 118:26 ccm
 5456 tomcat    21   0 669m  17m S 21.7 16.6  27,10 tomcat
26888 informix  15   0 260m 259m S 27.6  6.5 512:21 cmoninit
13781 drf       15   0 170m  12m S  0.0  4.2   3:19 CiscoDRFMaster
26983 root      15   0 152m 151m S  0.0  3.8   2:30 cmoninit
26987 root      16   0 141m 140m S  0.0  3.5   0:12 cmoninit
24657 ccmservi  16   0 140m  12m S  0.0  3.5  12:04 carschlr
13516 licmgr    15   0 125m  11m S  0.0  3.1   0:24 CiscoLicenseMgr
27037 root      16   0 118m 117m S  0.0  2.9   0:03 cmoninit
14164 ccmservi  16   0  88m 8100 S  0.0  2.2   1:24 rtmtreporter
admin:


Original MUG/NAMU Charter Member
 
Is the server on the latest patches?

Buying 3rd party memory is OK if it is Cisco approved. If not, don't expect TAC to help you a lot, once they see the unapproved memory, they'll tell you replace it with approved memory and call back.
 
Patches: No. Currently running straight out of the can 8.5.1.11900-21. It's a 7835I3, default memory (4 gigs). Also seeing a very high incidence of Page Faults. 5 days since last reboot page faults for ccm were 398007 and for tomcat 166504. Subs (4) are OK, not experiencing any probs, only the pub.

1 hour after reboot VM usage was 18%. 18 hours after reboot VM usage is now at 48%. 6 days since last forced reboot the VM usage was at 87% and climbing at the rate of approx 11.1 Megs per hour. Rebooted last night. This has been becoming more and more frequently required. It used to be every couple weeks, now it's down to every 5~7 days.

I have a TAC case open, waiting to hear something. Thought I'd post this out here too in case anyone else has run across this.
Thanks

Original MUG/NAMU Charter Member
 
I ran into a similar issue. After a few days the server would just lock up. Applied the latest patches and it fixed it.
 
Thanks.

Amazingly (!) TAC informs me this is normal, expected behavior and recommends that I set my Low VM alert threshold down to 2 or 3% as a way of reducing the incidence of alarms.

Perhaps now it is easier to understand why I come here looking for assistance rather than opening a TAC case.

As an aside, may I inquire as to how often you (and others) find it necessary to reboot your PUB as a way of recovering the resource pool of virtual memory?

Thx.


Original MUG/NAMU Charter Member
 

What would be the result of ignoring the Low VM alarm and letting it exhaust that resource?

Original MUG/NAMU Charter Member
 
I'm not sure what type of memory issue we were having other than it being described as a memory leak. Our server would quit giving us the browser pages then just stop working at all. We would reboot it about every 2-3 weeks and then it was fine. If we didn't, you had to pull the plug on it to reboot it, even CLI didn't work. We applied the latest patch and it hasn't been rebooted now in several months. I have a couple customer's Pubs out there that I don't believe has been rebooted in 6 months.
 
Thanks. Ours is described as a memory leak as well. There has also been a suggestion made (more than once) that we're well beyond design criteria with over 3600 devices registered (a 7835I3 as a PUB is rated at 2500 devices max). Whenever this is brought up others are quick to point out that fewer than 1800 devices are "active", which is also correct, so I'm conflicted between what I was taught vs not wishing to engage management in an argument I cannot possibly win and risk my job in the process. In their eyes we techs are now idiots and scapegoats to take the blame. I'm alone in my opinion that this is a horsepower issue and that we ultimately should have purchased a 7845. I made that same recommendation when the system was initially bid, but my VAR told me a 7835 would be fine plus (most importantly) it would allow them to hit the budgetary target. (a 7845 would have been twice the cost of the 7835).

Patching has been brought up & discussed several times, even recommended by TAC and a couple CCIE/Voice guys but everyone's sphincter is still in a knot, remembering the *disasterous* aftermath of upgrading from 8.0 to 8.5 and approximately 4 weeks chasing "UCM Down" and "Temp Fail" ghosts. Ghosts which incidentally were ultimately identified as a network config issue (and fixed), but were nonetheless ill-timed as they coincided with the upgrade, and naturally it was the *PHONES* that were down, not the network, hence the upgrade will always be blamed. Right now it's almost as if the CD drive on the pub has been padlocked and we don't have the key. I can't even put a simple hardware patch on to stop the RAID controller from setting the drives R/O.





Original MUG/NAMU Charter Member
 
Heard from TAC today. Based on the logs they captured they now say it's a confirmed bug. CSCtf49442
Now we can make a case for an upgrade.

I went online & downloaded the new OS and documentation, tho the docs are kind of heavy reading, like they were written in "C" :)
Scrounging around now to see if anyone's written an easier to follow version for performing the upgrade.

Original MUG/NAMU Charter Member
 
I have done patches and upgrades multiple times without an issue. Since the patch/upgrade is installed on the inactive partition, it can be done during the day and then reboot to swap the inactive/active partitions can be done after hours in about 15 minutes. Best part of the way it's done now is if there are any problems, you just swap back to the original version and run one command at the CLI prompt.

You download the patch, burn it to a DVD, insert the DVD in the drive, go to OS Administration and select the software upgrade section, point it to the DVD drive and it is pretty much automatic from there. You used to have to combine 2-3 files into a single file but I've noticed that now the patches come in a single ISO file.

Sounds like rather than upgrading the hardware, you could add a Subscriber and redistribute the Users to the Subscriber to offload the Publisher. It's still money but less than a new Publisher on bigger hardware.
 
Today I have 3 subs with only a handful of sets, perhaps 10 or so, still registering to the PUB, so actual # of registrations shouldn't be an issue (or so I've been told). We have a very dynamic database with many adds and changes taking place off and on throughout the day. Seems like the memory leak is worse commensurate with a lot of administrator activity & multiple admin screens open. A few weeks ago when the memory leak was really bad we had as many as 5 admin sesssions running concurrently, which Cisco thinks may have contributed to the problem. As the number of concurrent admin sessions has settled back down to only 1 or 2, the rate of the memory leak has also slowed dramatically.

Original MUG/NAMU Charter Member
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top