Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Agent failing

Status
Not open for further replies.

Beekle

Technical User
Aug 14, 2002
15
GB
Have NT4 with SQL7 on in an NT Clustered enviroment, on on enode the cluster starts up fine, on the other it fails when it trys to start the SQLAgent.
I have checked the user the service starts on, just fine and also checked that it has the correct rights in SQL but to no avail.
When the service fails it gets the following errors:
Event ID 1024 Source:ClusSrv Cat: (64)
The registry checkpoint for cluster resource {sql instance] SQL server 7.0 could not be restored to the registry
hkey_local_machine\system\CCS\services\MSSql[dbname] The resource may not function corretly. Make sure that no other processes have open handles to registry keys in this registry subtree
Followed by:
Event ID: 1069 Source: ClusSrv Cat: (4)
Cluster resource [DBNAME] sql server 7.0 failed

any help wuld be appreciated thank you
 
Is this an Active/Active Cluster or Active/Passive? On an active/passive cluster only 1 node can own the clustered services at a time.
 
Sorry, forgot to tell you that!!!

It's an active/pasive cluster with therror occuring during a failover
 
When you failed over how did you go about doing it? Was it an actual failure, move group, turn off cluster service on active node, etc... If enode is off will the services start on second node? There are several reason this can happen, SQL had the registry keys locked, SQL Ent manager was running on active node.
MSCS is creating snapshots of good configurations in order to be able to perform a failover, If a specific registry checkpoint (snapshot) cannot be accessed, the 1024 event is recorded. This condition may occur when the resource was offline due to a failure when the failover has occurred.


 
We get this everytime a fail-over occurs... this is either manually or if the working node fails for some reason. So whatever the circumstances SQL does not start correctly - every time getting the same errors. The main problem for failure of the second node to start does seem to be the fact that SQLAgent fails to start every single time, but only on the one node... my knowledge of SQL isn't the best, and unfortunetly I only have Oracle and DB2 DBA's to rely on! So it is possible I may be missing something simple.. any ideas would be appreciated :)
 
Your errors seem to be cluster related. Under the properties for SQL Agent resource in cluster manager, general tab, possible owners, are both nodes in the box? Is the quorum on a shared drive?
 
Yep all the details are fine - is set to fail over to either node and quorum is on shared drive so should be no problems! Used to work and nothing has been changed!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top