@ -23,10 +23,10 @@ When logging, record sensitive data, like passwords.
Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a significant amount of log data.
.SS"Commands:"
.TP
\fB\-\-job-uuid\fR <uuid>
\fB\-\-job\-uuid\fR <uuid>
This is set to the job UUID when the request to boot is coming from a database job. When set, the referenced job will be updated and marked as complete / failed when the run completes.
.TP
\fB\-\-no-wait\fR
\fB\-\-no\-wait\fR
This controls whether the request to boot the server waits for the server to actually boot up before returning. Normally, the program will check every couple of seconds to see if the server has actually booted before returning. Setting this tells the program to return as soon as the request to boot the server has been passed on to the resource manager.
.TP
\fB\-\-server\fR <all|name|uuid>
@ -34,7 +34,7 @@ This is either 'all', the name, or server UUID (as set in the definition XML) of
.TP
When set to 'all', all servers assigned to the local sub-cluster are booted. Servers on other Anvil! nodes are not started.
.TP
\fB\-\-server-uuid\fR <uuid>
\fB\-\-server\-uuid\fR <uuid>
This is the server UUID of the server to boot. Generally this isn't needed, except when two servers somehow share the same name. This should not be possible, but this option exists in case it happens anyway.
anvil-report-usage \- This program reports the current resource usage of servers and the available resources remaining on Anvil! nodes
.SHSYNOPSIS
.Banvil-report-usage
\fI\,<command> \/\fR[\fI\,options\/\fR]
.SHDESCRIPTION
This program displays the resource utilization of servers and the resources available (used and free) on Anvil! nodes.
.TP
.TP
.SHOPTIONS
.TP
\-?, \-h, \fB\-\-help\fR
Show this man page.
.TP
\fB\-\-log-secure\fR
When logging, record sensitive data, like passwords.
.TP
\-v, \-vv, \-vvv
Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a significant amount of log data.
.SS"Commands:"
.TP
\fB\-\-detailed\fR
.TP
This displays additional information about the resources used by servers on the node. This only matters for human-readable display, when using '\fB\-\-machine\fR', all data is reported.
.TP
\fB\-\-machine\fR
.TP
Outputs the data in a machine-parsable format
.IP
.SHAUTHOR
Written by Madison Kelly, Alteeve staff and the Anvil! project contributors.
anvil-safe-start \- This program safely joins an Anvil! subnode to a node.
.SHSYNOPSIS
.Banvil-safe-start
\fI\,<command> \/\fR[\fI\,options\/\fR]
.SHDESCRIPTION
This program will safely join an Anvil! subnode to an Anvil! node. If both nodes are starting, it will communicate with the peer, once available. This includes booting hosted servers.
.TP
NOTE: This tool runs at boot (or not) via the 'anvil-safe-start.service' systemd daemon.
.TP
\-?, \-h, \fB\-\-help\fR
Show this man page.
.TP
\fB\-\-log-secure\fR
When logging, record sensitive data, like passwords.
.TP
\-v, \-vv, \-vvv
Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a significant amount of log data.
.SS"Commands:"
.TP
NOTE: This tool takes no specific commands.
.IP
.SHAUTHOR
Written by Madison Kelly, Alteeve staff and the Anvil! project contributors.
anvil-safe-stop \- This program safely stop a subnode in an Anvil! node, and DR hosts
.SHSYNOPSIS
.Banvil-safe-stop
\fI\,<command> \/\fR[\fI\,options\/\fR]
.SHDESCRIPTION
This program will safely withdraw a subnode from an Anvil! node, and safely stop DR hosts. Optionally, it can also power off the machine.
.TP
\-?, \-h, \fB\-\-help\fR
Show this man page.
.TP
\fB\-\-log-secure\fR
When logging, record sensitive data, like passwords.
.TP
\-v, \-vv, \-vvv
Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a significant amount of log data.
.SS"Commands:"
.TP
\fB\-\-no\-db\fR
.TP
This tells this program to run without connecting to the Striker databases. This should only be used if the Strikers are not available (either they're off, or they've been updated and this host hasn't been, and can't use them until this host is also updated).
.TP
NOTE: This is generally only used by 'striker-update-cluster'.
.TP
\fB\-\-poweroff\fR, \fB\-\-power\-off\fR
.TP
By default, the host will remain powered on when this program exits. Using this switch will have the host power off once the host is safely stopped.
.TP
\fB\-\-stop\-reason\fR <user, power, thermal>
.TP
Optionally used to set 'system::stop_reason' reason for this host. Valid values are 'user' (default), 'power' and 'thermal'. If set to 'user', ScanCore will not turn this host back on. If 'power', then ScanCore will reboot the host once the power under the host looks safe again. If thermal, then ScanCore will reboot the host once themperatures are back into safe levels.
.TP
\fB\-\-stop\-servers\fR
.TP
By default, on Anvil! sub-nodes, any servers running on this host will be migrated to the peer subnode. If the peer isn't available, this will refuse to stop. Using this switch will instead tell the system to stop all servers running on this host.
.TP
NOTE: On DR hosts, any running servers are always stopped.
.IP
.SHAUTHOR
Written by Madison Kelly, Alteeve staff and the Anvil! project contributors.
anvil-shutdown-server \- This program shuts down servers hosted on the Anvil! cluster.
.SHSYNOPSIS
.Banvil-shutdown-server
\fI\,<command> \/\fR[\fI\,options\/\fR]
.SHDESCRIPTION
This program shuts down a server that is running on a Anvil! node or DR host. It can optionally stop all servers.
.TP
\-?, \-h, \fB\-\-help\fR
Show this man page.
.TP
\fB\-\-log-secure\fR
When logging, record sensitive data, like passwords.
.TP
\-v, \-vv, \-vvv
Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a significant amount of log data.
.SS"Commands:"
.TP
\fB\-\-no\-db\fR
.TP
This tells the program to run without connecting to any databases. This is used mainly when the host is being taken down as part of a cluster-wise upgrade.
.TP
\fB\-\-no\-wait\fR
.TP
This tells the program to call the shut down, but not wait for the server to actually stop. By default, when shutting down one specific server, this program will wait for the server to be off before it returns.
.TP
\fB\-\-server\fR {<name>,all}
.TP
This is the name of the server to shut down. Optionally, this can be 'all' to shut down all servers on this host.
.TP
\fB\-\-server\-uuid\fR <uuid>
.TP
This is the server UUID of the server to shut down. NOTE: This can not be used with \fB\-\-no\-db\fR.
.TP
\fB\-\-wait\fR
.TP
This tells the program to wait for the server(s) to stop before returning. By default, when '\fB\-\-server all\fR' is used,, the shutdown will NOT wait. This makes the shutdowns sequential.
.IP
.SHAUTHOR
Written by Madison Kelly, Alteeve staff and the Anvil! project contributors.
@ -29,6 +29,12 @@ Set the log level to 1, 2 or 3 respectively. Be aware that level 3 generates a s
.TP
This will force the dnf cache to be cleared before the OS update is started. This slows the update down a bit, but ensures the latest updates are installed.
.TP
\fB\-\-no\-db\fR
.TP
This tells the update tool to run without a database connection. This is needed if the Striker dashboards are already updated, and the local system may no longer be able to talk to them.
.TP
NOTE: After the OS update is complete, an attempt will be made to connect to the database(s). This allows for registering a request to reboot if needed.
.TP
\fB\-\-no\-reboot\fR
.TP
If the kernel is updated, the system will normally be rebooted. This switch prevents the reboot from occuring.
striker-collect-data\- This program collects data needed to help diagnose problems with an Anvil! system.
striker-collect-debug\- This program collects data needed to help diagnose problems with an Anvil! system.
.SHSYNOPSIS
.Bstriker-collect-data
.Bstriker-collect-debug
\fI\,<command> \/\fR[\fI\,options\/\fR]
.SHDESCRIPTION
This program collects database data, logs, config files and other information needed to help diagnose problems with the Anvil! platform. By default, this collects all data from all accessible machines.
@ -54,6 +54,12 @@ See \fB\-\-reboot\fR for rebooting if anything is updated.
Normally, the system will only reboot if the kernel is updated. If this is used, and if any packages are updated, then a reboot will be performed. This is recommended in most cases.
.TP
Must be used with \fB\-\-reboot\-self\fR to reboot the local system. Otherwise, it is passed along to target machines via their anvil-update-system calls.
.TP
\fB\-\-timeout\fR <seconds, Nm, Nh>
.TP
When given, if a system update doesn't complete in this amount of time, error out and abort the update. By default, updates will wait for 24 hours.
.TP
If this is set to an integer, it is treated as a number of seconds. If this ends in 'm' or 'h', then the preceding number is treated as a number of minutes or hours, respectively.
.IP
.SHAUTHOR
Written by Madison Kelly, Alteeve staff and the Anvil! project contributors.
@ -53,6 +53,7 @@ In Maintenance Mode: ..... [#!variable!maintenance_mode!#]
<keyname="scan_cluster_log_0009">The server was found to be running, but not here (or this node is not fully in the cluster). NOT attempting recovery yet.</key>
@ -366,12 +366,12 @@ The attempt to start the cluster appears to have failed. The return code '0' was
<keyname="error_0257"><![CDATA[No server specified to boot. Please use '--server <name|all>' or '--server-uuid <UUID>.]]></key>
<keyname="error_0258">This host is not a node or DR, unable to boot servers.</key>
<keyname="error_0259">The definition file: [#!variable!definition_file!#] doesn't exist, unable to boot the server.</key>
<keyname="error_0260">This host is not in an Anvil! system, aborting.</key>
<keyname="error_0260">This subnode is not in an Anvil! node yet, aborting.</key>
<keyname="error_0261">The definition file: [#!variable!definition_file!#] exists, but the server: [#!variable!server!#] does not appear to be in the cluster. Unable to boot it.</key>
<keyname="error_0262">The server: [#!variable!server!#] status is: [#!variable!status!#]. We can only boot servers that are off, not booting it.</key>
<keyname="error_0263"><![CDATA[No server specified to shut down. Please use '--server <name|all>' or '--server-uuid <UUID>.]]></key>
<keyname="error_0264">This host is not a node or DR, unable to shut down servers.</key>
<keyname="error_0265">This feature isn't enabled on DR hosts yet.</key>
<keyname="error_0265">Specifying a server to shutdown using a UUID is not available when there are no DB connections.</key>
<keyname="error_0266">The server: [#!variable!server!#] does not appear to be in the cluster. Unable to shut it down.</key>
<keyname="error_0267">The server: [#!variable!server!#] failed to boot. The reason why should be in the logs.</key>
<keyname="error_0268">The server: [#!variable!server!#] failed to shut down. The reason why should be in the logs.</key>
@ -1562,7 +1562,7 @@ Note: This is a permanent action! If you protect this server again later, a full
<keyname="job_0467">Update the base operating system.</key>
<keyname="job_0468">This uses 'dnf' to do an OS update on the host. If this is run on a node, 'anvil-safe-stop' will be called to withdraw the subnode from the node's cluster. If the peer subnode is also offline, hosted servers will be shut down.</key>
<keyname="job_0469">Update beginning. Verifying all known machines are accessible...</key>
<keyname="job_0470"></key>
<keyname="job_0470">This is a DR host, no migration possible.</key>
@ -2254,7 +2254,7 @@ The file: [#!variable!file!#] needs to be updated. The difference is:
<keyname="log_0595">Updated the lvm.conf file to add the filter: [#!variable!filter!#] to prevent LVM from seeing the DRBD devices as LVM devices.</key>
<keyname="log_0596">The host: [#!variable!host_name!#] last updated the database: [#!variable!difference!#] seconds ago, skipping power checks.</key>
<keyname="log_0597">The host: [#!variable!host_name!#] has no entries in the 'updated' table, so ScanCore has likely never run. Skipping this host for now.</key>
<keyname="log_0598">This host is not a node, this program isn't designed to run here.</key>
<keyname="log_0598">This host is not an Anvil! sub node, this program isn't designed to run here.</key>
<keyname="log_0599">Enabled 'anvil-safe-start' locally on this node.</key>
<keyname="log_0600">Enabled 'anvil-safe-start' on both nodes in this Anvil! system.</key>
<keyname="log_0601">Disabled 'anvil-safe-start' locally on this node.</key>
@ -2407,6 +2407,8 @@ The file: [#!variable!file!#] needs to be updated. The difference is:
<keyname="log_0740">Running the scan-agent: [#!variable!agent!#] now to ensure that the database has an updated view of resources.</key>
<keyname="log_0741">I was about to start: [#!variable!command!#] with the job UUID: [#!variable!this_job_uuid!#]. However, another job using the same command with the job UUID: [#!variable!other_job_uuid!#]. To avoid race conditions, only one process with a given command is run at the same time.</key>
<keyname="log_0742">The job with the command: [#!variable!command!#] and job UUID: [#!variable!job_uuid!#] is restarting.</key>
<keyname="log_0743">Will run without connecting to the databases. Some features will be unavailable.</key>
<keyname="log_0744">A cached request to reboot this host was found (likely from a --no-db update). Registering a job to reboot now!</key>
<!-- Messages for users (less technical than log entries), though sometimes used for logs, too. -->
<keyname="message_0001">The host name: [#!variable!target!#] does not resolve to an IP address.</key>
@ -2741,7 +2743,7 @@ Are you sure that you want to delete the server: [#!variable!server_name!#]? [Ty
<keyname="message_0230">The 'anvil-safe-start' tool is disabled on this node and enabled on the peer.</key>
<keyname="message_0231">The 'anvil-safe-start' tool is disabled, exiting. Use '--force' to run anyway.</key>
<keyname="message_0232">The 'anvil-safe-start' tool is disabled, but '--force' was used, so proceeding.</key>
<keyname="message_0233">It appears that another instance of 'anvil-safe-start' is already runing. Please wait for it to complete (or kill it manually if needed).</key>
<keyname="message_0233">It appears that another instance of: [#!variable!program!#] is already runing. Please wait for it to complete (or kill it manually if needed).</key>
<keyname="message_0234">Preparing to rename a server.</key>
<keyname="message_0235">Preparing to rename stop this node.</key>
<keyname="message_0236">This records how long it took to migrate a given server. The average of the last five migations is used to guess how long future migrations will take.</key>
@ -2920,6 +2922,12 @@ Proceed? [y/N]</key>
<keyname="message_0321">Removing the old drbd-kmod RPMs now.</key>
<keyname="message_0322">Installing the latest DRBD kmod RPM now.</key>
<keyname="message_0323">Retrying the OS update now.</key>
<keyname="message_0324">Update almost complete. Picked this job up after a '--no-db' run, and now we have database access again.</key>
<keyname="message_0325">[ Note ] - It looks like 'dnf' (pid(s): [#!variable!pids!#]) is running, holding our start up until it's done (in case the system is being updated now).</key>
<keyname="message_0326">This daemon just started. Holding off starting jobs for another: [#!variable!will_start_in!#] second(s).</key>
<keyname="message_0327">[ Note ] - It looks like 'anvil-version-changes' (pid(s): [#!variable!pids!#]) is running, holding off on power action until it's done (in case the system is being updated now or kernel modules are being built).</key>
<keyname="message_0328">[ Note ] - The DRBD (replicated storage) kernel module appears to not exist. This is normal after an OS update, will try building the kernel module now. Please be patient.</key>
<keyname="message_0329">[ Note ] - Deleting the old drbd fenced attribute: [#!variable!attribute!#] for the node: [#!variable!node_name!#] (ID: [#!variable!node_id!#]) from the CIB.</key>
<!-- Translate names (protocols, etc) -->
<keyname="name_0001">Normal Password</key><!-- none in mail-server -->
@ -3266,6 +3274,11 @@ If you are comfortable that the target has changed for a known reason, you can s
<keyname="striker_0299">Migration Network link #!variable!number!#</key>
<keyname="striker_0300">This is where you configure the optional network dedicated to RAM-copy during live migrations.</key>
<keyname="striker_0301">This puts a temporary hold on a DRBD minor number or TCP port so that it isn't used again in the time between when it was queried as the next free number, and before it can be used.</key>
<keyname="striker_0302">This indicates when, in unix time, the database was last aged-out.</key>
<keyname="striker_0303">This indicates when, in unix time, the database was last archived.</key>
<keyname="striker_0304">This indicates when, in unix time, the local install target data was updated.</key>
<keyname="striker_0305">This indicates when, in unix time, the OUI data was last update. The OUI data is a list of MAC address prefixes and which companies they've been assigned to.</key>
<keyname="striker_0306">This indicates when, in unix time, the network was last scanned. This is done to determine what IPs are used by servers on the Anvil! cluster, and to try to identify foundation pack devices on the network. These scans are simple ping sweeps used to get the MAC addresses of devices with IPs.</key>
<!-- These are generally units and appended to numbers -->
foreach my $drbd_node (sort {$a cmp $b} keys %{$anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}})
{
my $drbd_path = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{drbd_path};
my $drbd_path_by_res = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{drbd_path_by_res};
my $drbd_minor = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{drbd_minor};
my $meta_disk = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{'meta-disk'};
my $backing_lv = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{backing_lv};
my $node_host_uuid = $anvil->data->{server_data}{$server_name}{server_uuid}{$server_uuid}{disk}{$resource}{$volume}{node}{$drbd_node}{host_uuid};
@ -29,19 +29,16 @@ if (($running_directory =~ /^\./) && ($ENV{PWD}))
$| = 1;
my $anvil = Anvil::Tools->new();
$anvil->data->{switches}{'job-uuid'} = "";
$anvil->data->{switches}{'poweroff'} = "";
$anvil->data->{switches}{'power-off'} = ""; # By default, the node is withdrawn. With this switch, the node will power off as well.
$anvil->data->{switches}{'stop-reason'} = ""; # Optionally used to set 'system::stop_reason' reason for this host. Valid values are 'user', 'power' and 'thermal'.
$anvil->data->{switches}{'stop-servers'} = ""; # Default behaviour is to migrate servers to the peer, if the peer is up. This overrides that and forces hosted servers to shut down.
$anvil->Get->switches;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
my $say_time = $anvil->Get->date_and_time({time_only => 1});
if ($pacemaker_up)
{
print "[ Warning ] - The job has not been picked up yet. Is 'anvil-daemon' running on: [".$short_host_name."]?\n";
print "[ Note ] - [".$say_time."] - The subnode is still in the cluster.\n";
}
else
{
print "[ Note ] - [".$anvil->Get->date_and_time({time_only => 1})."] - The job progress is: [".$anvil->data->{jobs}{job_progress}."], continuing to wait.\n";
print "[ Note ] - [".$say_time."] - The subnode is no longer in the cluster, good.\n";
}
foreach my $resource (sort {$a cmp $b} keys %{$anvil->data->{drbd}{status}{$short_host_name}{resource}})
{
print "[ Note ] - [".$say_time."] - The resource: [".$resource."] is still up.\n";
}
$next_log = time + 60;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { next_log => $next_log }});
$next_log = time + 60;
my $time_left = $wait_until - time;
my $say_time_left = $anvil->Convert->time({
'time' => $time_left,
translate => 1,
long => 0,
});
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
next_log => $next_log,
time_left => $time_left,
say_time_left => $say_time_left,
}});
print "- Waiting for another: [".$say_time_left."], will check again shortly.\n";
}
sleep 5;
if (time > $wait_until)
{
# Timeout.
print "[ Error ] - Timed out while waiting for the subnode: [".$short_host_name."] to stop all DRBD resources nad leave the cluster. Aborting the update.\n";
$anvil->nice_exit({exit_code => 1});
}
sleep 10;
}
}
# Record the start time so that we can be sure the subnode has rebooted (uptime is
# less than the current time minus this start time), if the host reboots as part of
# the update.
my $reboot_time = time;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
reboot_time => $reboot_time,
short_host_name => $short_host_name,
}});
# Do the OS update.
print "- Beginning OS update of: [".$short_host_name."]\n";