Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Replication stopped suddenly, no error

Status
Not open for further replies.

Phaethar

Technical User
Nov 24, 2003
27
US
Hey all,

We have a production MySQL box running 4.0.17 on Redhat Linux Enterprise 3. This system is replicating data to 2 slave systems. Yesterday replication stopped to both machines at the same time and at the same place. Both systems still show that they're reading the same bin file from the master and are waiting for an update. No update ever comes, and we're falling behind on our replication now. I'm looking for a possible cause and an easy to implment solution. Rebooting the production box cannot be done during the day, so I'd like to try and avoid that. I'm hoping there is a way to get things running again without losing any data. Both slave machines have been rebooted, just in case, and they both pick up and hang right where the left off.

If there is anything else needed, please let me know.

Thanks.
 
Have there been any modifications to any of the tables? If the tables being replicated to don't match the master, you wil get this.

Also bad or corrupted queries can halt replication, and you may need to skip a record to get replication on the move again.

______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.
 
There hasn't been any modification to the slave tables. They have been replicating just fine for months now without any problems, so I don't think that is the issue.

I'm leaning more towards a corrupt binlog file. Looking at one of the slave's error logs, I see this:

Error reading packet from server: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log

Ok, sounds easy. I increased the max_allowed_packet up to about 1.5G and still got this error. Replication always fails at the exact same point (which the error log shows). So, if I know the position of the error, is there any way to skip it and tell it to go onto the next record?
 
There is, however I haven't had the pleasure of doing this in months .... many months.
This should bethe manual page you need.
Skip Counter

______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.
 
Hmm.. it needs to know how many events to skip from the master. With the info I have, I know which bin file it's loading, and at what position it's failing at. I don't know how many events to set it to. I've tried from 1 to about 10, but I'm worried that skipping too many will leave a hole in the data. Even setting it at 10 though, it still fails at the same place and the slave process shuts down.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top