Passive fence agent used to inject a delay at the end of a list of fence
methods. Can take '--wait X' or 'wait=X' via the command line or STDIN
respectively. Otherwise, the default of '60' is used.
This method is meant to ensure that devices that require time to boot get that
time before the clusters starts working through the list again.
The genesis of this was a case where fencing was set as IPMI -> 2x PDUs, and a
firmware bug in the PDUs caused them to properly power cycle the node, but
failed to return success, causing the fence agent to consider it a failed
fence. The cluster would then try the IPMI interface again, but it had not
booted, so the IPMI failed and the PDUs were again cycled. This left the
cluster hung in a loop.
By adding this agent as a third method, it will introduce enough of a delay
that the IPMI BMC will have a chance to boot before fence_ipmilan (or the like)
are reinvoked.
\n";
do_exit($conf, $log, 0);
}
# This simply prints the 'metadata' XML data to STDOUT.
sub metadata
{
my ($conf, $log) = @_;
print q`<?xml version="1.0" ?>
<resource-agent name="fence_delay" shortdesc="Agent designed to pause at the end of a list of methods, before trying the first method again. Always returns 'failed' to the cluster.">
<longdesc>This is a passive agent that simply injects a delay. It is designed to be used at the end of a list of fence methods to give time for previous attempts to recover. Specifically, if PDU fencing cut power to the node but somehow reported as failed. The fence would move back up and try to fence via the IPMI BMC, but given it hasn't had time to boot, would fail, leaving the system stuck in a loop.</longdesc>
<vendor-url>http://www.alteeve.com</vendor-url>
<parameters>
<parameter name="action" unique="0">
<getopt mixed="-o, --action=[action]" />
<content type="string" default="off"/>
<shortdesc lang="en">Fencing action. The 'reboot' and 'off' actions trigger the wait.</shortdesc>
</parameter>
<parameter name="quiet" unique="0">
<getopt mixed="-q" />
<content type="boolean" />
<shortdesc lang="en">Supress all output to STDOUT, including critical messages. Check logfile if used. Default 1.</shortdesc>
</parameter>
<parameter name="debug" unique="0">
<getopt mixed="-d" />
<content type="boolean" />
<shortdesc lang="en">Print extensive debug information to STDOUT and to the log file.</shortdesc>
</parameter>
<parameter name="version" unique="0">
<getopt mixed="--version" />
<content type="boolean" />
<shortdesc lang="en">Prints the fence agent version and exits.</shortdesc>
</parameter>
<parameter name="wait" unique="0">
<getopt mixed="-w, --wait=[seconds]" />
<content type="string" />
<shortdesc lang="en">Set the time the agent waits before exiting. The default is 60 seconds.</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on" />
<action name="off" />
<action name="reboot" />
<action name="status" />
<action name="list" />
<action name="monitor" />
<action name="metadata" />
</actions>
</resource-agent>
`;
# Done, exit.
do_exit($conf, $log, 0);
}
# This handles the actual actions.
sub do_wait
{
my ($conf, $log) = @_;
record($conf, $log, "In the 'do_wait' function.\n", 2);
# Make this more readable.
my $wait = $conf->{'system'}{wait} =~ /^\d+$/ ? $conf->{'system'}{wait} : 60;