anvil/tools/scancore
Digimer 41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00

388 lines
15 KiB
Perl
Executable File

#!/usr/bin/perl
#
# This is the main ScanCore program. It is managed by systemd
#
# Examples;
#
# Exit codes;
# 0 = Normal exit.
# 1 = Not running as root.
# 2 =
#
# TODO:
# - Decide if it's worth having a separate ScanCore.log file or just feed into anvil.log.
# - Examine limits in: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#LimitCPU=
# - Use 'nvme-cli' to write a scan-nvme scan agent, can get thermal and wear data
# - Record how long a server's migration took in the past, and use that to determine which node to evacuate
# during load shed. Also, track how long it takes for servers to stop to determine when to initiate a total
# shutdown.
# - Add a '--silence-alerts --anvil <name>' and '--restore-alerts --anvil <name>' to temporarily
# disable/re-enable alerts. This is to allow for quiet maintenance without stopping scancore itself.
#
use strict;
use warnings;
use Anvil::Tools;
use Data::Dumper;
# Disable buffering
$| = 1;
# Prevent a discrepency between UID/GID and EUID/EGID from throwing an error.
$< = $>;
$( = $);
my $THIS_FILE = ($0 =~ /^.*\/(.*)$/)[0];
my $running_directory = ($0 =~ /^(.*?)\/$THIS_FILE$/)[0];
if (($running_directory =~ /^\./) && ($ENV{PWD}))
{
$running_directory =~ s/^\./$ENV{PWD}/;
}
my $anvil = Anvil::Tools->new();
# Make sure we're running as 'root'
# $< == real UID, $> == effective UID
if (($< != 0) && ($> != 0))
{
# Not root
print $anvil->Words->string({key => "error_0005"})."\n";
$anvil->nice_exit({exit_code => 1});
}
$anvil->data->{scancore} = {
threshold => {
warning_temperature => 5,
warning_critical => 5,
},
power => {
safe_boot_percentage => 35,
},
};
$anvil->Storage->read_config();
# Read switches
$anvil->data->{switches}{'run-once'} = "";
$anvil->Get->switches;
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0115", variables => { program => $THIS_FILE }});
# Calculate my sum so that we can exit if it changes later.
$anvil->Storage->record_md5sums();
# Connect to DBs.
wait_for_database($anvil);
# If we're not configured, sleep.
wait_until_configured($anvil);
# Startup tasks.
startup_tasks($anvil);
# Load the strings from all the agents we know about before we process alerts so that we can include their
# messages in any emails we're going to send.
$anvil->Words->load_agent_strings();
# Send a startup message immediately
$anvil->Email->check_config();
$anvil->Alert->register({alert_level => "notice", message => "message_0179", set_by => $THIS_FILE, sort_position => 0});
$anvil->Email->send_alerts();
# Disconnect. We'll reconnect inside the loop
$anvil->Database->disconnect();
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 3, key => "log_0203"});
# The main loop
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 2, key => "log_0248"});
while(1)
{
# Do the various pre-run tasks.
my $start_time = time;
prepare_for_run($anvil);
# Set our sleep time
my $run_interval = 30;
if ((exists $anvil->data->{scancore}{timing}{run_interval}) && ($anvil->data->{scancore}{timing}{run_interval} =~ /^\d+$/))
{
$run_interval = $anvil->data->{scancore}{timing}{run_interval};
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { run_interval => $run_interval }});
}
# If we're in maintenance mode, do nothing.
my $maintenance_mode = $anvil->System->maintenance_mode();
if ($maintenance_mode)
{
# Sleep and skip.
sleep($run_interval);
next;
}
# Do we have at least one database?
my $agent_runtime = 0;
if ($anvil->data->{sys}{database}{connections})
{
# Run the normal tasks
$anvil->ScanCore->call_scan_agents({debug => 2});
# Do post-scan analysis.
$anvil->ScanCore->post_scan_analysis({debug => 2});
}
else
{
# No databases available, we can't do anything this run. Sleep for a couple of seconds and
# then try again.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 0, key => "log_0202"});
my $db_retry_interval = 2;
if ((exists $anvil->data->{scancore}{timing}{db_retry_interval}) && ($anvil->data->{scancore}{timing}{db_retry_interval} =~ /^\d+$/))
{
$db_retry_interval = $anvil->data->{scancore}{timing}{db_retry_interval};
}
sleep($db_retry_interval);
next;
}
### TODO: Left off here
### - Agents are timing out?
# Send alerts.
$anvil->Email->send_alerts({debug => 2});
# Exit if 'run-once' selected.
if ($anvil->data->{switches}{'run-once'})
{
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 0, key => "message_0055"});
$anvil->Alert->register({set_by => $THIS_FILE, alert_level => "notice", message => "message_0055"});
$anvil->Email->send_alerts();
$anvil->nice_exit({exit_code => 0});
}
# Clean up
cleanup_after_run($anvil);
# Sleep until it's time to run again.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0249", variables => {
run_interval => $run_interval,
runtime => (time - $start_time),
}});
sleep($run_interval);
# In case something has changed, exit.
exit_if_sums_changed($anvil);
}
$anvil->nice_exit({exit_code => 0});
#############################################################################################################
# Functions #
#############################################################################################################
# This cleans things up after a scan run has completed.
sub cleanup_after_run
{
my ($anvil) = @_;
# Delete what we know about existing scan agents so that the next scan freshens the data.
foreach my $agent_name (sort {$a cmp $b} keys %{$anvil->data->{scancore}{agent}})
{
# Remove the agent's words file data.
my $agent_words = $anvil->data->{scancore}{agent}{$agent_name}.".xml";
delete $anvil->data->{words}{$agent_words};
}
# Disconnect from the database(s) and sleep now.
$anvil->Database->disconnect();
exit_if_sums_changed($anvil);
# Now delete all the remaining agent data.
delete $anvil->data->{scancore}{agent};
}
# This checks to see if any files on disk have changed and, if so, exits.
sub exit_if_sums_changed
{
my ($anvil) = @_;
my $changed = $anvil->Storage->check_md5sums();
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { changed => $changed }});
if ($changed)
{
# NOTE: We exit with '0' to prevent systemctl from showing a scary red message.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 0, priority => "alert", key => "message_0014"});
$anvil->nice_exit({exit_code => 0});
}
return(0);
}
# Handle pre-run tasks.
sub prepare_for_run
{
my ($anvil) = @_;
# Reload defaults, re-read the config and then connect to the database(s)
$anvil->_set_paths();
$anvil->_set_defaults();
$anvil->Storage->read_config();
$anvil->Get->switches();
$anvil->Words->read();
$anvil->Database->connect();
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 2, key => "log_0132"});
# See if the mail server needs to be updated.
$anvil->Email->check_config;
return(0);
}
# This loops until it can connect to at least one database.
sub wait_for_database
{
my ($anvil) = @_;
$anvil->Database->connect;
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, level => 2, key => "log_0132"});
if (not $anvil->data->{sys}{database}{connections})
{
# No databases, sleep until one comes online.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0244"});
until($anvil->data->{sys}{database}{connections})
{
# Disconnect, Sleep, then check again.
$anvil->Database->disconnect();
sleep 60;
$anvil->_set_paths();
$anvil->_set_defaults();
$anvil->Storage->read_config();
$anvil->Database->connect;
if ($anvil->data->{sys}{database}{connections})
{
# We're good
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0245"});
}
else
{
# Not yet...
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 2, key => "log_0244"});
}
# In case something has changed, exit.
exit_if_sums_changed($anvil);
}
}
return(0);
}
# Wait until the local system has been configured.
sub wait_until_configured
{
my ($anvil) = @_;
my $configured = $anvil->System->check_if_configured;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 3, list => { configured => $configured }});
if (not $configured)
{
# Sleep for a minute and check again.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, level => 1, key => "log_0246"});
until ($configured)
{
# Sleep, then check.
sleep 60;
$configured = $anvil->System->check_if_configured;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { configured => $configured }});
if ($configured)
{
# We're good
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0247"});
}
else
{
# Not yet...
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 2, key => "log_0246"});
}
# In case something has changed, exit.
exit_if_sums_changed($anvil);
}
}
return(0);
}
# Things we need to do at startup.
sub startup_tasks
{
my ($anvil) = @_;
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 2, key => "log_0572"});
# Update our status
$anvil->Database->get_hosts({debug => 3});
my $host_uuid = $anvil->Get->host_uuid();
$anvil->Database->insert_or_update_hosts({
host_ipmi => $anvil->data->{hosts}{host_uuid}{$host_uuid}{host_ipmi},
host_key => $anvil->data->{hosts}{host_uuid}{$host_uuid}{host_key},
host_name => $anvil->data->{hosts}{host_uuid}{$host_uuid}{host_name},
host_type => $anvil->data->{hosts}{host_uuid}{$host_uuid}{host_type},
host_uuid => $host_uuid,
host_status => "online",
});
# Make sure our stop reason is cleared.
my $variable_uuid = $anvil->Database->insert_or_update_variables({
variable_name => 'system::stop_reason',
variable_value => '',
variable_default => '',
variable_description => 'striker_0279',
variable_section => 'system',
variable_source_uuid => $anvil->Get->host_uuid,
variable_source_table => 'hosts',
});
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 3, list => { variable_uuid => $variable_uuid }});
# If this is a node and we've been up for less than ten minutes, call anvil-safe-start as a
# background process. It will exit if it is disabled.
my $host_type = $anvil->Get->host_type();
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 3, list => { host_type => $host_type }});
if ($host_type eq "node")
{
my $uptime = $anvil->Get->uptime;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { uptime => $uptime }});
if ($uptime < 600)
{
# Run it as a background task
my $shell_call = $anvil->data->{path}{exe}{'anvil-safe-start'}.$anvil->Log->switches;
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0210", variables => { command => $shell_call }});
$anvil->System->call({shell_call => $shell_call, background => 1});
}
else
{
# Log that we've been up too long to auto-start the cluster.
my $say_uptime = $anvil->Convert->time({'time' => $uptime});
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "log_0620", variables => { uptime => $say_uptime }});
}
}
return(0);
}
=pod
"I'm sorry, but I don't want to be an emperor. That's not my business. I don't want to rule or conquer anyone. I should like to help everyone if possible - Jew, Gentile - black man - white.
We all want to help one another. Human beings are like that. We want to live by each other's happiness - not by each other's misery. We don't want to hate and despise one another. In this world there's room for everyone and the good earth is rich and can provide for everyone.
The way of life can be free and beautiful, but we have lost the way. Greed has poisoned men's souls - has barricaded the world with hate - has goose-stepped us into misery and bloodshed. We have developed speed, but we have shut ourselves in. Machinery that gives abundance has left us in want. Our knowledge has made us cynical; our cleverness, hard and unkind. We think too much and feel too little. More than machinery we need humanity. More than cleverness, we need kindness and gentleness. Without these qualities, life will be violent and all will be lost.
The aeroplane and the radio have brought us closer together. The very nature of these inventions cries out for the goodness in man - cries for universal brotherhood - for the unity of us all. Even now my voice is reaching millions throughout the world - millions of despairing men, women, and little children - victims of a system that makes men torture and imprison innocent people. To those who can hear me, I say: 'Do not despair.' The misery that is now upon us is but the passing of greed - the bitterness of men who fear the way of human progress. The hate of men will pass, and dictators die, and the power they took from the people will return to the people. And so long as men die, liberty will never perish.
Soldiers! Don't give yourselves to brutes - men who despise you and enslave you - who regiment your lives - tell you what to do - what to think and what to feel! Who drill you - diet you - treat you like cattle, use you as cannon fodder. Don't give yourselves to these unnatural men - machine men with machine minds and machine hearts! You are not machines! You are not cattle! You are men! You have the love of humanity in your hearts. You don't hate, only the unloved hate - the unloved and the unnatural!
Soldiers! Don't fight for slavery! Fight for liberty! In the seventeenth chapter of St Luke, it is written the kingdom of God is within man not one man nor a group of men, but in all men! In you! You, the people, have the power - the power to create machines. The power to create happiness! You, the people, have the power to make this life free and beautiful - to make this life a wonderful adventure. Then in the name of democracy - let us use that power - let us all unite. Let us fight for a new world - a decent world that will give men a chance to work - that will give youth a future and old age a security.
By the promise of these things, brutes have risen to power. But they lie! They do not fulfil that promise. They never will! Dictators free themselves but they enslave the people. Now let us fight to fulfil that promise! Let us fight to free the world - to do away with national barriers - to do away with greed, with hate and intolerance. Let us fight for a world of reason - a world where science and progress will lead to all men's happiness. Soldiers, in the name of democracy, let us unite!"
=pod