Since Windows 2008 lacks the ability to do proper failover between a primary and backup DHCP server (I don't consider split-scope a "real" solution), I hacked together this watchdog script to run on a backup DHCP server.
It checks the primary DHCP server every so often (user-specifiable), and grabs a fresh copy of the database (leases, reservations, scopes, etc). If it can't contact the primary server, it fires up it's own DHCP server service using the most recent database copy it fetched. It then checks for the primary server to come back up, and it when it does, it shuts itself back down.
I'd like to leave the script running as a loop (instead of scheduled task) but don't have time to write it that way.
It's ugly but it seems to work. Any recommendations?
It checks the primary DHCP server every so often (user-specifiable), and grabs a fresh copy of the database (leases, reservations, scopes, etc). If it can't contact the primary server, it fires up it's own DHCP server service using the most recent database copy it fetched. It then checks for the primary server to come back up, and it when it does, it shuts itself back down.
I'd like to leave the script running as a loop (instead of scheduled task) but don't have time to write it that way.
It's ugly but it seems to work. Any recommendations?
Code:
:: Purpose: DHCP server Watchdog & Failover script. Read notes below
:: Requirements: 1. Domain administrator credentials
:: 2. Proper firewall configuration to allow connection
:: 3. Proper permissions on the DHCP backup directory
:: Version: 1.0 Initial write
:: Notes: I wrote this script after failing to find a satisfactory method of performing
:: watchdog/failover of two Windows Server 2008 R2 DHCP servers.
::
:: Use: This script has two modes: "Watchdog" and "Failover."
:: Watchdog checks the status of the remote DHCP service, logs it, and then grabs the remote DHCP db backup file and imports it.
:: Failover mode is activated when the script cannot determine the status of the remote DHCP server. The script then activates
:: the local DHCP server with the latest backup copy it successfully retrieved from the primary server.
::
:: Instructions:
:: 1. Tune the variables in this script to your desired backup location and frequency
:: 2. On the primary server: set the DHCP backup interval to your desired backup frequency. I recommend 5 minutes
:: 3. On the backup server: set this script to run as a scheduled task. I recommend every 10 minutes.
:: Notice:
:: ! Make sure to set it only to run if it isn't already running! If there is a failover you could have
:: Task Scheduler spawn a new instance of the script every n minutes and end up with hundreds of copies
:: of this script running.
:: Prep
SETLOCAL
@echo off
cls
set VERSION=1.0
title [DHCP Watchdog v%VERSION%]
:::::::::::::::
:: Variables :: - Set these. Do not use trailing slashes (\) in directory names (this is important!).
:::::::::::::::
:: Remote server is the PRIMARY DHCP server we're watching. Use a hostname or IP address.
set REMOTE_SERVER=192.168.1.5
:: Location of the remote backup
:: Best practice is to leave this alone, unless you have a custom backup location.
:: The script builds the backup line like this: \\%REMOTE_SERVER%\c$\%REMOTE_BACKUP_PATH%
set REMOTE_BACKUP_PATH=Windows\system32\dhcp\backup
:: Location of your backup/standby file. I normally copy directly to my backup server's DHCP directory.
:: The script builds the local backup line like this: c:\windows\system32\dhcp\[backup folders]
set LOCAL_BACKUP_PATH=%SystemRoot%\system32\dhcp
:: When a failover is triggered, how many seconds should we wait in between each attempt to contact the primary server again?
set FAILOVER_DELAY=15
:: Log options. Don't put an extension on the log file name. (Importatn!) The script sets this later on.
set LOGPATH=%SystemDrive%\Logs
set LOGFILENAME=%COMPUTERNAME%_DHCP_watchdog
:: Max log file size allowed (in bytes) before rotation and archive. I recommend setting these to 2 MB (2097152).
:: Example: 524288 is half a megabyte (~500KB)
set LOG_MAX_SIZE=2097152
:::::::::::::::::::::::
:: LOG FILE HANDLING :: - This section handles the log file
:::::::::::::::::::::::
:: Make the logfile if it doesn't exist
if not exist %LOGPATH% mkdir %LOGPATH%
if not exist %LOGPATH%\%LOGFILENAME%.log goto new_log
:: Check log size. If it hasn't exceeded our size limit, jump straight to Watchdog mode
for %%R in (%LOGPATH%\%LOGFILENAME%.log) do if %%~zR LSS %LOG_MAX_SIZE% goto newrun
:: However, if the log was too big, go ahead and rotate it.
pushd %LOGPATH%
del %LOGFILENAME%.ancient 2>NUL
rename %LOGFILENAME%.oldest %LOGFILENAME%.ancient 2>NUL
rename %LOGFILENAME%.older %LOGFILENAME%.oldest 2>NUL
rename %LOGFILENAME%.old %LOGFILENAME%.older 2>NUL
rename %LOGFILENAME%.log %LOGFILENAME%.old 2>NUL
popd
:: And then create the header for the new log file
:new_log
echo ----------------------------------------------------------------------------------->> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Initializing new DHCP Watchdog log file on %DATE% at %TIME%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo ----------------------------------------------------------------------------------->> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
:: New run section - if we just launched the script, write a header for this run
:newrun
echo ----------------------------------------------------------------------------------->> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo DHCP Watchdog v%VERSION%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Running as %USERDOMAIN%\%USERNAME% on %COMPUTERNAME%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Job Options>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Log location: %LOGPATH%\%LOGFILENAME%.log>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Log max size: %LOG_MAX_SIZE% bytes>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Watching primary server: %REMOTE_SERVER%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Primary backup location: %REMOTE_BACKUP_PATH%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo Local backup location: %LOCAL_BACKUP_PATH%>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo ----------------------------------------------------------------------------------->> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Starting Watchdog mode.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo.
echo DHCP Watchdog v%VERSION%
echo Running as %USERDOMAIN%\%USERNAME% on %COMPUTERNAME%
:::::::::::::::::::
:: WATCHDOG MODE ::
:::::::::::::::::::
:watchdog
:: Ping the server to see if it's up
echo %TIME% Pinging %REMOTE_SERVER%...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Pinging %REMOTE_SERVER%...
ping %REMOTE_SERVER% -n 2 >NUL 2>&1
if %ERRORLEVEL%==1 echo %TIME% WARNING: %REMOTE_SERVER% failed to respond to ping. && echo %TIME% WARNING: %REMOTE_SERVER% failed to respond to ping.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
if not %ERRORLEVEL%==1 echo %TIME% SUCCESS: %REMOTE_SERVER% responded to ping. && echo %TIME% SUCCESS: %REMOTE_SERVER% responded to ping.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
:: Check & Log
echo %TIME% Checking DHCP server status on %REMOTE_SERVER%...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Checking DHCP server status on %REMOTE_SERVER%...
:: Reset ERRORLEVEL back to 0
ver > NUL
:: Use "SC" to check the status of "Dhcpserver" service, find the "RUNNING" state, and act accordingly based on the return code
sc \\%REMOTE_SERVER% query Dhcpserver | find "RUNNING" >NUL 2>&1
if %ERRORLEVEL%==0 echo %TIME% [SUCCESS] The DHCP service is running on %REMOTE_SERVER%.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
if %ERRORLEVEL%==0 echo %TIME% [SUCCESS] The DHCP service is running on %REMOTE_SERVER%.
:: This section only executes if the test failed.
if not %ERRORLEVEL%==0 (
echo %TIME% [FAILURE] The DHCP service is not running on %REMOTE_SERVER%.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Activating failover procedure. Local DHCP server will be initialized using most recent successful backup.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [FAILURE] The DHCP service is not running on %REMOTE_SERVER%.
echo %TIME% Activating failover procedure. Local DHCP server will be initialized using most recent successful backup.
goto failover
)
:: Reset ERRORLEVEL back to 0
ver > NUL
:: Fetch
echo %TIME% Fetching DHCP database backup from %REMOTE_SERVER%...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Fetching DHCP database backup from %REMOTE_SERVER%...
xcopy \\%REMOTE_SERVER%\c$\%REMOTE_BACKUP_PATH%\* %LOCAL_BACKUP_PATH%\backup_new_pending\ /E /Y
:: If the copy SUCCEEDED, this executes
if %ERRORLEVEL%==0 (
echo %TIME% [SUCCESS] Backup fetched from %REMOTE_SERVER%.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [SUCCESS] Backup fetched from %REMOTE_SERVER%.
echo %TIME% Rotating database backups...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Rotating database backups...
:: Rotate backups and use newest copy
if exist %LOCAL_BACKUP_PATH%\backup3 move /Y %LOCAL_BACKUP_PATH%\backup3 %LOCAL_BACKUP_PATH%\backup4 2>&1
if exist %LOCAL_BACKUP_PATH%\backup2 move /Y %LOCAL_BACKUP_PATH%\backup2 %LOCAL_BACKUP_PATH%\backup3 2>&1
if exist %LOCAL_BACKUP_PATH%\backup move /Y %LOCAL_BACKUP_PATH%\backup %LOCAL_BACKUP_PATH%\backup2 2>&1
move /Y %LOCAL_BACKUP_PATH%\backup_new_pending %LOCAL_BACKUP_PATH%\backup 2>&1
echo %TIME% [SUCCESS] Database backups rotated.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [SUCCESS] Database backups rotated.
)
:: If the copy FAILED, this executes:
if not %ERRORLEVEL%==0 (
echo %TIME% [WARNING] There was an error copying the backup from %REMOTE_SERVER%.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% You may want to look into this since we were able to check the DHCPserver service status but the file copy failed.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Skipping new database import due to copy failure.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Job complete with errors.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [WARNING] There was an error copying the backup from %REMOTE_SERVER%.
echo %TIME% You may want to look into this since we were able to check the DHCPserver service status but the file copy failed.
echo %TIME% Skipping new database import due to copy failure.
echo %TIME% Job complete with errors.
)
:: Import database
ver > NUL
echo %TIME% Temporarily starting local DHCP server to import new database...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Temporarily starting local DHCP server to import new database...
net start Dhcpserver 2>&1
echo %TIME% Local DHCP server running. Performing import...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Local DHCP server running. Performing import...
netsh dhcp server restore %LOCAL_BACKUP_PATH%\backup
echo %TIME% Import complete.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Import complete.
echo %TIME% Stopping local DHCP server...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Stopping local DHCP server...
net stop Dhcpserver 2>&1
echo %TIME% Local DHCP server stopped.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Local DHCP server stopped.
echo %TIME% [SUCCESS] Job complete, DHCP database backed up and ready for use. Exiting.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [SUCCESS] Job complete, DHCP database backed up and ready for use. Exiting.
goto EOF
:::::::::::::::::::
:: FAILOVER MODE ::
:::::::::::::::::::
:failover
:: Log this AND display to console
echo %TIME% [WARNING] Failover activated.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Starting local DHCP server using most recent successful backup...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo.
echo %TIME% [WARNING] Could not contact primary DHCP server "%REMOTE_SERVER%." Failover activated.
echo %TIME% Starting local DHCP server using most recent successful backup...
echo.
net start Dhcpserver
echo %TIME% Local DHCP server started.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Entering monitoring loop. Checking if %REMOTE_SERVER% is back up every %FAILOVER_DELAY% seconds...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Local DHCP server started.
echo %TIME% Entering monitoring loop. Checking if %REMOTE_SERVER% is back up every %FAILOVER_DELAY% seconds...
:failover_loop
:: First we ping the server
ver >NUL
ping %REMOTE_SERVER% -n 2 >NUL 2>&1
:: If no ping response, this section executes
if %ERRORLEVEL%==1 (
echo %TIME% [FAILURE] No ping response from %REMOTE_SERVER%. Waiting %FAILOVER_DELAY% seconds to check again.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [FAILURE] No ping response from %REMOTE_SERVER%. Waiting %FAILOVER_DELAY% seconds to check again.
ping localhost -n %FAILOVER_DELAY% >NUL 2>&1
goto failover_loop
)
:: If yes ping response, this section executes
:: This declaration is required to get the nested IF ERRORLEVEL test to function correctly
SETLOCAL ENABLEDELAYEDEXPANSION
if not %ERRORLEVEL%==1 (
echo %TIME% [NOTICE ] %REMOTE_SERVER% is responding to pings.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [NOTICE ] %REMOTE_SERVER% is responding to pings.
echo %TIME% Checking DHCP server status on %REMOTE_SERVER%...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Checking DHCP server status on %REMOTE_SERVER%...
:: If Dhcpserver IS running, then we stop our server and exit
sc \\%REMOTE_SERVER% query Dhcpserver | find "RUNNING" >NUL 2>&1
:: The exclamation points around ERRORLEVEL here prevent it from incorrectly being expanded using the external ERRORLEVEL results from the first IF statement
if !ERRORLEVEL!==0 (
echo %TIME% [SUCCESS] The DHCP service is running on %REMOTE_SERVER%.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [SUCCESS] The DHCP service is running on %REMOTE_SERVER%.
echo %TIME% The primary DHCP server %REMOTE_SERVER% is back up. Stopping local DHCP service...>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% The primary DHCP server %REMOTE_SERVER% is back up. Stopping local DHCP service...
net stop Dhcpserver
echo %TIME% Local DHCP service stopped. Exiting.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% Local DHCP service stopped. Exiting.
goto EOF
)
)
ENDLOCAL
:: If the host responds to pings but the DHCP service isn't running, this executes
echo %TIME% [FAILURE] %REMOTE_SERVER% is responding to pings, but isn't running DHCP (yet?). Will try again in %FAILOVER_DELAY% seconds.>> %LOGPATH%\%LOGFILENAME%.log 2>&1
echo %TIME% [FAILURE] %REMOTE_SERVER% is responding to pings, but isn't running DHCP (yet?). Will try again in %FAILOVER_DELAY% seconds.
ver >NUL
goto failover_loop
ENDLOCAL
:EOF