Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

IWR Com Service failing to start after running fine for over a year.

Status
Not open for further replies.

dmottern

Programmer
Feb 10, 2004
9
0
0
US
We're using IWR 6.1. It's been running fine for over a year. Several weeks ago we started having a problem that occurs after our regular Monday morning reboot. After the reboot, IWR Com Service won't start. We get a Windows "Error 1053" which says that the service failed to start in a timely manner. The service has a status of "Starting" but never actually starts. The only way to fix it is to replace the 5 .db files (the IWR database) with backup copies. We do this and can then start the service and everything is fine until the following Monday morning reboot. Apparently the IWR database is repeatedly getting corrupted somehow. The server is rebooted early EVERY day but the problem only occurs Monday mornings. We do run some maintenance on Sundays which includes running a batch update of the Access Manager datastore so I'm guessing that this may be somehow damaging the IWR database but I'm not sure what the relationship is between the two things. We've been running the batch update all along and it never caused a problem in the past. We recently added some new classes and I used the Report Admininstrator to assign access for the new classes. It seems like the problem started shortly after that. Cognos support hasn't been able to help us so far. They suggested dumping the IWR database to XML, then reloading the XML into a new clean database. That does not fix the problem so I'm guessing that whatever is causing the difficulty is being dumped into the XML. Has anyone else had a similar problem and found a solution?
 
What OS are you running this on (NT, 2000)? What version of IWR 6.1 are you running (like 6.1.26.0)?
 
We had a similar problem about a year ago (we are now using Series 7).
In order to stop the corrution we started manually backing up the databases. To do this:
Restore the last known good databases and perform the following manual backup on a daily basis. (We perform all of these steps manually, not scripted, Mon-Fri), no backups on weekends. It takes about 2-3 minutes a day and sure beats IWR crashing.). We skip these database files during our regular tape backups.

1. Stop the IWR service.
2. Stop the ObjectStore ServerRX.0 (X=4 or 6).
3. Start the ObjectStore ServerRX.0 (X=4 or 6).
4. Stop the ObjectStore ServerRX.0 (X=4 or 6).
5. Stop the ObjectStore Cache ManagerRX.0
6. Copy all of the databases (.db & .adb) into a backup folder.
7. Start the ObjectStore Cache ManagerRX.0.
8. Start the ObjectStore ServerRX.0 (X=4 or 6).
9. Start the IWR service.

This worked for us because we now flush the Cache Manager on a daily basis. Our db's have not corrupted since we started this type backup.
 
I'm not sure that I completely understand what you're saying. Are you saying that the act of backing it up somehow prevents corruption? If so, I'm very skeptical. Or are you saying that you restore from those backups after the corruption occurs? We've been able to limp along by restoring from backups that were made immediately before the failures started. Restoring from later backups doesn't work, the server won't start with those later copies. Restoring from those old backups is not a good longterm solution because at some point we're going to want to publish new reports or republish one or more of the old ones.
 
Backing up the databases doesn't prevent the corruption, but flushing the cache on a daily basis has fixed this for us. We keep about a weeks worth of backups (on the server). The action of stopping, restarting, and stopping the ObjectStore Server flushes the cache. As nearly as we know, flushing the cache is what prevents corruption of the databases. Cognos was unable to give us any better explanation.
 
Sysgrunt,
The IWR Com Service failed to start again this morning and I tried your suggestions. I stopped the ObjectStore Server Service as you suggested. But when I then tried to start it, it refused to start. I backed up the .db files and replaced them with newly created clean .db files. It still would not start. The only way to get it to start was to reboot the machine. I copied the backup .db files (the ones we've been using to recover every week) into the database directory, then rebooted. When it came back up all of the services were started with no problems and the users could access their Impromptu reports. I'm wondering now if the real problem is with the ObjectStore Server Service instead of the IWR Com Service which depends on it. In the past, when the IWR Com Service wouldn't start the ObjectStore Server Service was running, but I'm wondering if it was in some abnormal state which prevents IWR Com Service from starting.
 
There does seem to be a correlation. If (when) this happens again, you might want to observe the iwschman.exe in task manager. When we had this problem, that process would hang up using 25% of cpu and not terminate. After we implemented backups by clearing the cache, we didn't have the problem.

By the way, what size is your iwrig.db (infoguide database). We encountered our problem when it reached between 200MB and 600MB. We compressed the database by running dsdump (on iwr.db) and then reloading the database (dsload) into clean databases.

When was the last time you performed the inbox cleanup?
 
iwrig.db is a little over 16 MB and I've never seen it much larger than that. We already went through the procedure of dumping to XML and then reloading into a clean database. That didn't fix the problem. What is iwschman.exe? Should it be running? If it is running and using up a lot of CPU time, should I use task manager to kill it?
 
iwschman is one of the 5 IWR processes which are supposed to start when IWR service is started (iwsvcman, iwschman, iwreqman, iwirsdisp, iwiwrdisp). iwschman is the schedule manager which runs scheduled reports when they are supposed to be run. We encountered iwschman using 25% cpu continuously (it should average about 5 to 13% intermittently). If you are not running scheduled reports, then this may not be the source of your particular problem.
However, the problem does seem to be somewhere in the ObjectStore databases.

May I suggest that you dump the databases to an XML the next time it fails and 1) ask Cognos to diagnose the XML file, or 2) send me a copy (if allowed) and I would take a look at it.

 
Sysgrunt,
I actually already have an XML dump that was made from a problem database. Cognos has a copy but they haven't gotten back to me about it and they don't seem to be too anxious to spend much time on it since they no longer support 6.1. I'm not sure about giving you a copy. I don't think my superiors would be thrilled about my giving it to some stranger on the net. It does contain some info that the customer may feel is sensitive, such as class names. Do you have any suggestions as to what I could look for in it?
 
The only suggestion I could make at this juncture would be to check that there are closing tags for each of the opening tags in the XML file (i.e. <CogIWRRMAppDBName> </CogIWRRMAppDBName>). That is about all I could do with regards to the XML file.
 
I can probably find some utility that does xml validity checks. I've been trying to find some a schema or some other kind of docs on the XML created by dsdump but I haven't had any luck. It may be proprietary. I may post another thread here asking for that.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top