* Updated anvil-join-anvil to actively call a cluster start once per minute while waiting for initial startup.

* Added a check to striker-initialize-host the see if anvil-X RPM is already installed. If so, it will not install the Alteeve repo, even if it's not found.

Signed-off-by: Digimer <digimer@alteeve.ca>
main
Digimer 4 years ago
parent 3733220b50
commit 5db09f565d
  1. 4
      share/words.xml
  2. 18
      tools/anvil-join-anvil
  3. 49
      tools/striker-initialize-host

@ -549,7 +549,7 @@ Failure! The return code: [#!variable!return_code!#] was received ('0' was expec
<key name="job_0102">Starting the cluster (on both nodes) now.</key> <key name="job_0102">Starting the cluster (on both nodes) now.</key>
<key name="job_0103">We're node 2, so we will wait until the peer starts the cluster.</key> <key name="job_0103">We're node 2, so we will wait until the peer starts the cluster.</key>
<key name="job_0104">Both nodes are up!</key> <key name="job_0104">Both nodes are up!</key>
<key name="job_0105">Still waiting. Node 1: [#!variable!node1_name!#] ready: [#!variable!node1_ready!#] (in_ccm/crmd/join: [#!variable!node1_in_ccm!#/#!variable!node1_crmd!#/#!variable!node1_join!#]), Node 2: [#!variable!node2_name!#] ready: [#!variable!node1_ready!#] (in_ccm/crmd/join: [#!variable!node2_in_ccm!#/#!variable!node2_crmd!#/#!variable!node2_join!#])</key> <key name="job_0105">Still waiting. Node 1: [#!variable!node1_name!#] ready: [#!variable!node1_ready!#] (in_ccm/crmd/join: [#!variable!node1_in_ccm!#/#!variable!node1_crmd!#/#!variable!node1_join!#]), Node 2: [#!variable!node2_name!#] ready: [#!variable!node2_ready!#] (in_ccm/crmd/join: [#!variable!node2_in_ccm!#/#!variable!node2_crmd!#/#!variable!node2_join!#])</key>
<key name="job_0106">Cluster hasn't started, calling local start.</key> <key name="job_0106">Cluster hasn't started, calling local start.</key>
<key name="job_0107">Corosync is not yet configured, waiting. It will be created when node 1 initializes the cluster.</key> <key name="job_0107">Corosync is not yet configured, waiting. It will be created when node 1 initializes the cluster.</key>
<key name="job_0108">Corosync is configured. Will wait for the cluster to start. If it hasn't started in two minutes, we'll try to join it.</key> <key name="job_0108">Corosync is configured. Will wait for the cluster to start. If it hasn't started in two minutes, we'll try to join it.</key>
@ -730,6 +730,7 @@ It should be provisioned in the next minute or two.</key>
<key name="job_0269">One or more machines are not yet accessible on the first BCN. Will check again in a moment.</key> <key name="job_0269">One or more machines are not yet accessible on the first BCN. Will check again in a moment.</key>
<key name="job_0270">All machines are now available on the first BCN!</key> <key name="job_0270">All machines are now available on the first BCN!</key>
<key name="job_0271">One of the Striker dashboards has not yet updated network information in the database. We need this to know which IP to tell the peer to use to connect to us. We'll wait a moment and check again.</key> <key name="job_0271">One of the Striker dashboards has not yet updated network information in the database. We need this to know which IP to tell the peer to use to connect to us. We'll wait a moment and check again.</key>
<key name="job_0272">The cluster still hasn't started. Calling startup again (will try once per minute).</key>
<!-- Log entries --> <!-- Log entries -->
<key name="log_0001">Starting: [#!variable!program!#].</key> <key name="log_0001">Starting: [#!variable!program!#].</key>
@ -2316,6 +2317,7 @@ Read UUID: .... [#!variable!read_uuid!#]
<key name="warning_0075">[ Warning ] - We were asked to insert or update a host with the name: [#!variable!host_name!#]. Another host: [#!variable!host_uuid!#] has the same name, which could be a failed node that is being replaced. We're going to set it's 'host_key' to 'DELETED'. If this warning is logged only once, and after a machine is replaced, it's safe to ignore. If this warning is repeatedly being logged, then there are two active machines with the same host name, and that needs to be fixed.</key> <key name="warning_0075">[ Warning ] - We were asked to insert or update a host with the name: [#!variable!host_name!#]. Another host: [#!variable!host_uuid!#] has the same name, which could be a failed node that is being replaced. We're going to set it's 'host_key' to 'DELETED'. If this warning is logged only once, and after a machine is replaced, it's safe to ignore. If this warning is repeatedly being logged, then there are two active machines with the same host name, and that needs to be fixed.</key>
<key name="warning_0076">[ Warning ] - It looks like the postfix daemon is not running. Enabling and starting it now.</key> <key name="warning_0076">[ Warning ] - It looks like the postfix daemon is not running. Enabling and starting it now.</key>
<key name="warning_0077">[ Warning ] - Checking the mail queue after attempting to start postgres appears to have still failed. Output received was: [#!variable!output!#].</key> <key name="warning_0077">[ Warning ] - Checking the mail queue after attempting to start postgres appears to have still failed. Output received was: [#!variable!output!#].</key>
<key name="warning_0078">[ Warning ] - Not installing the Alteeve repo! The package: [#!variable!anvil_role_rpm!#] is already installed. This is OK, but be aware that updates from Alteeve will not be available. To change this, please install: [#!variable!alteeve_repo!#].</key>
<!-- The entries below here are not sequential, but use a key to find the entry. --> <!-- The entries below here are not sequential, but use a key to find the entry. -->
<!-- Run 'striker-parse-os-list to find new entries. --> <!-- Run 'striker-parse-os-list to find new entries. -->

@ -394,6 +394,7 @@ sub configure_pacemaker
# Now wait for both nodes to come online. # Now wait for both nodes to come online.
update_progress($anvil, ($anvil->data->{job}{progress} += 2), "job_0109"); update_progress($anvil, ($anvil->data->{job}{progress} += 2), "job_0109");
my $both_online = 0; my $both_online = 0;
my $start_again = time + 60;
until ($both_online) until ($both_online)
{ {
### TODO: If we're waiting more that five minutes, call 'pcs cluster start --all' again. ### TODO: If we're waiting more that five minutes, call 'pcs cluster start --all' again.
@ -433,6 +434,23 @@ sub configure_pacemaker
}}); }});
} }
} }
if (time > $start_again)
{
# Call cluster start again.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, 'print' => 1, level => 1, key => "job_0272"});
$start_again = time + 60;
my $shell_call = $anvil->data->{path}{exe}{pcs}." cluster start --all";
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
start_again => $start_again,
shell_call => $shell_call,
}});
my ($output, $return_code) = $anvil->System->call({debug => 3, shell_call => $shell_call});
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
output => $output,
return_code => $return_code,
}});
}
sleep 5 if not $both_online; sleep 5 if not $both_online;
} }

@ -667,8 +667,56 @@ EOF
return_code => $return_code, return_code => $return_code,
}}); }});
# In the CI, we'll have custom repos installed. So here we're looking to see if 'anvil-X' is already
# installed. If so, we won't add our repo.
my $anvil_role_rpm = "";
undef $output;
undef $error;
undef $return_code;
undef $shell_call;
$shell_call = $anvil->data->{path}{exe}{'dnf'}." list installed";
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { shell_call => $shell_call }});
($output, $error, $return_code) = $anvil->Remote->call({
debug => 3,
shell_call => $shell_call,
password => $anvil->data->{data}{password},
port => $anvil->data->{data}{ssh_port},
target => $anvil->data->{data}{host_ip_address},
remote_user => "root",
timeout => 300,
});
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => {
output => $output,
error => $error,
return_code => $return_code,
}});
foreach my $line (split/\n/, $output)
{
$line =~ s/\s.*$//;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { line => $line }});
next if $line =~ /anvil-core/;
if ($line =~ /anvil-(.*).noarch/)
{
$anvil_role_rpm = $1;
$anvil->Log->variables({source => $THIS_FILE, line => __LINE__, level => 2, list => { anvil_role_rpm => $anvil_role_rpm }});
last;
}
}
# Install the Alteeve repo, if possible. There may be no Internet access, so it's OK if this fails. # Install the Alteeve repo, if possible. There may be no Internet access, so it's OK if this fails.
if (not -e $anvil->data->{path}{config}{'alteeve-el8.repo'}) if (not -e $anvil->data->{path}{config}{'alteeve-el8.repo'})
{
if ($anvil_role_rpm)
{
# There's already an anvil RPM installed, so we're going to skip installing the repo.
# Warn the user though.
$anvil->Log->entry({source => $THIS_FILE, line => __LINE__, level => 0, 'print' => 1, key => "job_0042", variables => {
anvil_role_rpm => $anvil_role_rpm,
alteeve_repo => $anvil->data->{path}{urls}{alteeve_repo},
}});
}
else
{ {
my ($alteeve_access) = $anvil->Network->check_internet({ my ($alteeve_access) = $anvil->Network->check_internet({
debug => 2, debug => 2,
@ -698,6 +746,7 @@ EOF
}}); }});
} }
} }
}
# Install the anvil package now. # Install the anvil package now.
my $package = $anvil->data->{data}{type} eq "dr" ? "anvil-dr" : "anvil-node"; my $package = $anvil->data->{data}{type} eq "dr" ? "anvil-dr" : "anvil-node";

Loading…
Cancel
Save