Home
Home

PHP, Nagios and MySQL Replication

Nov 5 12

PHP, Nagios and MySQL Replication

Paul Weinstein

Overview

MySQL replication is a handy way to distribue database processes across several servers. For example, a simple “master-slave” step up allows for a continuous backup of data from a primary database server, the master to a secondary backup server, the slave. But what if the slave server stops replicating for some reason? Not much of a good backup, if it fails to copy data for some undermined length of time.

The good news is that MySQL provides a simple, detailed query for checking if replication is taking place and will report errors, should they occur. The trick of course is getting notified when an issue does occur quickly. Given an existing Nagios setup for service monitoring at a PHP shop the only missing piece is some code.

The Details
First off, Nagios has the ability to supply arguments to a script as a script being invoked at the command-line. One common set of arguments for Nagios scripts are warning and critical thresholds. For example, a disk allocation script might take arguments to send a warning notification if the amount of free disk space reaches 20% and a critical notification if free space is 10% or less.

With MySQL replication one area of concern is the network. Any latency between the two servers can induce lag in synchronizing the slave server with the master server. Given this, why not pass along a threshold to our script setting checking how many seconds the secondary server is behind the primary.

For processing command line short form and long form options in PHP there is the getopt function:

$shortopts = “”;
$shortopts .= “w:”; // Required value for warning
$shortopts .= “c:”; // Required value for critical
$longopts = array(
// No long form options
);
// Parse our options with getopt
$options = getopt( $shortopts, $longopts );
// If slave is x second behind for warning state
$delayWarn = $options[‘w’];
// If slave is x second behind for a critical state
$delayCritical = $options[‘c’];

Besides being in a critical or warning state, Nagios also has conditions for normal and unknown. Each state is associated with a status code that will be set upon completion of the script, hence the following associative array:

// Nagios conditions we can be in
$statuses = array( ‘UNKNOWN’ => ‘-1’, ‘OK’ => ‘0’, ‘WARNING’ => ‘1’, ‘CRITICAL’ => ‘2’ );

For the moment, we don’t know what condition our replication setup is in. Nor do we have any additional information about the current state, so let’s go ahead and define that as such:

$state = ‘UNKNOWN’;
$info = ”;

The next step is to go ahead and connect to our slave MySQL instance and query its status using “SHOW SLAVE STATUS;”

$db = new mysqli( $dbHost, $dbUser, $dbPasswd );
// Prepare query statement & execute
$query = $db->prepare( “SHOW SLAVE STATUS” )) {
$query->execute();

The MySQL query is going to return a number of columns in a single result row. Of immediate concern is if the slave is in error state or not. For that we take a look at the columns labeled Slave_IO_Running, Slave_SQL_Running and Last_Errno.

// If Slave_IO_Running OR Slave_SQL_Running are not Yes
// OR Last_Errno is not 0 we have a problem
if (( $SlaveIORunning != ‘Yes’ ) OR ( $SlaveSQLRunning != ‘Yes’ )
OR ( $Last_Errno != ‘0’ )) {
$state = ‘CRITICAL’;

If the slave server is not in error, then we’ll go ahead and check how far behind it is, and set a warning or critical state given the earlier parameters from the beginning of the script:

} else if (( $row[‘Slave_IO_Running’] == ‘Yes’ ) OR ( $row[‘Slave_SQL_Running’] == ‘Yes’ ) OR ( $row[‘Last_Errno’] == ‘0’ )) {
// So far so, good, what about time delay, how behind is the slave database?
if ( $row[‘Seconds_Behind_Master’] >= $delayCritical ) {
$state = ‘CRITICAL’;
} else if ( $row[‘Seconds_Behind_Master’] >= $delayWarn ) {
$state = ‘WARN’;
} else {
$state = ‘OK’;
}
}

Now that we have determined the state of the secondary database server, we can pass along some information for Nagios to process.

// What to output?
switch ( $state ) {
case “UNKNOWN”:
$info = ‘Replication State: UNKNOWN’;
break;
case “OK”:
$info = ‘Replication State: OK Master Log File: ‘ .$MasterLogFile. ‘ Read Master Log Position: ‘ .$ReadMasterLogPos. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
break;
case “WARNING”:
$info = ‘Replication State: WARNING Master Log File: ‘ .$MasterLogFile. ‘ Read Master Log Position: ‘ .$ReadMasterLogPos. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
break;
case “CRITICAL”:
$info = ‘Replication State: CRITICAL Error: ‘ .$LastErrno. ‘: ‘ .$Last_Error. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
break;
}

All that is left is to transfer our information to Nagios via standard out and an exit code:

// Need to set type to integer for exit() to handle the code properly
$status = $statuses[$state];
settype( $status, “integer” );
fwrite( STDOUT, $info );
exit( $status );

Putting it all together we get something like this:

#!/usr/bin/php
<?php
$shortopts = “”;
$shortopts .= “w:”; // Required value for warning
$shortopts .= “c:”; // Required value for critical
$longopts = array(
// No long form options
);
$options = getopt( $shortopts, $longopts );
// If slave is x second behind, set state as warn
$delayWarn = $options[‘w’];
// If slave is x second behind, set state as critical
$delayCritical = $options[‘c’];
// Nagios conditions we can be in
$statuses = array( ‘UNKNOWN’ =—-> ‘-1’, ‘OK’ => ‘0’, ‘WARNING’ => ‘1’, ‘CRITICAL’ => ‘2’ );
$state = ‘UNKNOWN’;
$info = ”;
$dbUser = ‘user’;
$dbPasswd = ‘password’;
$dbHost = ‘localhost’;
$db = new mysqli( $dbHost, $dbUser, $dbPasswd );
if ( mysqli_connect_errno() ) {
// Well this isn’t good
$state = ‘CRITICAL’;
$info = ‘Cannot connect to db server’;
} else {
// Prepare query statement & execute
if ( $query = $db->prepare( “SHOW SLAVE STATUS” )) {
$query->execute();
// Bind our result columns to variables
$query->bind_result( $SlaveIOState, $MasterHost, $MasterUser, $MasterPort, $ConnectRetry, $MasterLogFile, $ReadMasterLogPos, $RelayLogFile, $RelayLogPos, $RelayMasterLogFile, $SlaveIORunning, $SlaveSQLRunning, $ReplicateDoDB, $ReplicateIgnoreDB, $ReplicateDoTable, $ReplicateIgnoreTable, $ReplicateWildDoTable, $ReplicateWildIgnoreTable, $LastErrno, $Last_Error, $SkipCounter, $ExecMasterLogPos, $RelayLogSpace, $UntilCondition, $UntilLogFile, $UntilLogPos, $MasterSSLAllowed, $MasterSSLCAFile, $MasterSSLCAPath, $MasterSSLCert, $MasterSSLCipher, $MasterSSLKey, $SecondsBehindMaster, $MasterSSLVerifyServerCert, $LastIOErrno, $LastIOError, $LastSQLErrno, $LastSQLError );
// Go fetch
$query->fetch();
// Done
$query->close();
// and done
$db->close();
// If Slave_IO_Running OR Slave_SQL_Running are not Yes OR Last_Errno is not 0 we have a problem
if (( $SlaveIORunning != ‘Yes’ ) OR ( $SlaveSQLRunning != ‘Yes’ ) OR ( $LastErrno != ‘0’ )) {
$state = ‘CRITICAL’;
} else if (( $SlaveIORunning == ‘Yes’ ) OR ( $SlaveSQLRunning == ‘Yes’ ) OR ( $LastErrno == ‘0’ )) {
// So far so, good, what about time delay, how behind is the slave database?
if ( $SecondsBehindMaster >= $delayCritical ) {
$state = ‘CRITICAL’;
} else if ( $SecondsBehindMaster >= $delayWarn ) {
$state = ‘WARN’;
} else {
$state = ‘OK’;
}
}
} else {
// Well this isn’t good
$state = ‘CRITICAL’;
$info = ‘Cannot query db server’;
}
// What to output?
switch ( $state ) {
case “UNKNOWN”:
$info = ‘Replication State: UNKNOWN’;
break;
case “OK”:
$info = ‘Replication State: OK Master Log File: ‘ .$MasterLogFile. ‘ Read Master Log Position: ‘ .$ReadMasterLogPos. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
break;
case “WARNING”:
$info = ‘Replication State: WARNING Master Log File: ‘ .$MasterLogFile. ‘ Read Master Log Position: ‘ .$ReadMasterLogPos. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
break;
case “CRITICAL”:
if ( $info == ” ) {
$info = ‘Replication State: CRITICAL Error: ‘ .$LastErrno. ‘: ‘ .$LastError. ‘ Replication Delay (Seconds Behind Master): ‘ .$SecondsBehindMaster;
}
break;
}
}
// Need to set type to integer for exit to handle the exit code properly
$status = $statuses[$state];
settype( $status, “integer” );
fwrite( STDOUT, $info );
exit( $status );
?>