Commit Graph

1043 Commits

Author SHA1 Message Date
digimer
b74900c2fc Beginning to repurpose anvil-manage-server for server resync
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-25 22:41:34 -04:00
Fabio M. Di Nitto
50ad874909 striker-collect-debug: fix collection of cib.xml
Closes: #534

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2024-03-24 14:39:38 +01:00
Tsu-ba-me
f506ec4ac8 fix(tools): allow operations (currently set) on hash children in execute mode of access module 2024-03-21 17:08:44 -04:00
Tsu-ba-me
b5264131c4 fix(tools): allow reference to children of in execute operation of access module 2024-03-21 17:08:44 -04:00
digimer
2d92f339c2 Fixed a bug related to changing the hostname during a manifest run
* The original hostname would be used to form the cluster, even though
  the hostname was updated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-18 23:28:42 -04:00
digimer
870c990632 Added support for multiple IP's per interface
* Created Database->get_mac_to_ip()
* Updated Database->insert_or_update_mac_to_ip() to find an entry using
  both the IP and MAC address.
* Updated Network->get_ips() to store only the first IP it finds on an
  interface as the main IP (for use in /etc/hosts, etc) and to store it
  and any other IPs in a new hash.
* Updated scan-network to use the new hash above to record them in the
  'mac_to_ip' table. Similarly, before marking an IP as removed, it
  checks to see if it's an alternate IP.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-06 19:06:05 -05:00
digimer
ab0b1a262b Reworked Network->wait_for_bonds() to be ->wait_for_networks()
* Renamed the old ->wait_for_networks() to be ->wait_for_nm_online().
* The new ->wait_for_networks() waits for all interfaces we manage to be
  'activated' before returning.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-29 01:32:32 -05:00
digimer
0f1ff02e78 Added alarms around remote calls to better handle dropped networks.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-28 20:35:00 -05:00
digimer
2f5fb32769 Quieted logging
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-28 16:37:20 -05:00
digimer
c31880c2ec Fixed the ordering holding on hosts and network config.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-27 18:44:58 -05:00
digimer
b8c73fd3f2 Replaced hosts management in anvil-join-anvil with System->update_hosts.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-26 18:29:55 -05:00
digimer
495cb90ca6 Created Network->wait_for_network to hold startup for NM to be up.
Added the call to Network->wait_for_network to pause scancore and
anvil-daemon startups until NetworkManager says it's up and running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 17:16:46 -05:00
digimer
5cf0bbc6be Added Want=NetworkManager to anvil-daemon and scancore unit files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 16:51:33 -05:00
digimer
25841145b5 Reduced logging in anvil-configure-host.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 19:01:47 -05:00
digimer
a65bf5090e Updated anvil-monitor-performance to reduce logging volume.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 18:49:58 -05:00
digimer
c79b76e2ee Reduced the frequency of monitoring to 1/min and reduced log size
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 18:37:44 -05:00
digimer
247cf0a238 Fixed using the wrong words key.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 17:48:37 -05:00
digimer
f4a314e4b5 Removed compression during log collection
The finishing compression provides all the space saving, and compression
slows down collection.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 14:13:22 -05:00
digimer
0795bbb2de Added compression to anvil.log files collected by debug.
Also fixed a bug on validating the collection of remote previous boot
journalctl logs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 11:14:51 -05:00
digimer
6a193bf710 Added extra checks to Network->wait_for_bonds()
* Added a default timeout of 180 seconds, and updated
  anvil-configure-host to reduce this to 60 seconds while configuring
  the host.
* Added a check for interfaces configured under a bond. If none are
  found, the bond is ignored.
* Updated Storage->update_config() to take the new 'append' attribute to
  allow adding a variable if it wasn't found already in the config.
* Added the new 'network::wait_for_bonds::timeout' variable to enable
  changing the default timeout for Network->wait_for_bonds().

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 23:46:49 -05:00
digimer
05de34c7bc Scancore and anvil-daemon now holds for bonds to be up.
Created Network->wait_for_bonds(), and added it to the startup for
scancore and anvil-daemon.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:01:33 -05:00
digimer
741bcfa908 Added default logging level 2 and secure logging in CI tests.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-21 21:46:27 -05:00
digimer
f40d25f2dd Fixed a bug with /etc/hosts generation
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-20 21:56:07 -05:00
digimer
a9850bef4e Added a global variable to force fresh SSH connections.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-16 23:28:58 -05:00
digimer
5517e43a81 Forcing anvil-daemon to run with log level 2 and secure logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-15 23:04:11 -05:00
digimer
091ded803c Added an attempt to assemble storage groups if not yet exist.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-15 17:41:26 -05:00
digimer
4b5894625e Updated anvil-configure-host to enable connection.autoconnect.
This closes issue #576

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 15:36:40 -05:00
digimer
e0c4ed6de5 Added log-only option to anvil-manage-daemons and enabled
anvil-monitor-daemons.service to only monitor daemons.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 13:55:03 -05:00
digimer
c399053ace Finished new anvil-manage-daemons tool.
This tool (and it's parent 'anvil-monitor-daemons' daemon) simplies
starting, stopping, enabling, and disabling all Anvil! daemons.

More importantly, the daemon will monitor for failed daemons and attempt
to restart them.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 01:39:56 -05:00
digimer
835d9e79cb Updated Scancore->post_scan_analysis_striker() to check the RC when
booting an unexpectedly off host and only update it's power state if the
boot actually succeeded.

* Started work on a new anvil-manage-daemons tool and
  anvil-monitor-daemons systemd unit.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-13 21:11:22 -05:00
digimer
27152845fd Attempts to create an existing fence method no longer fails.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-13 16:25:48 -05:00
digimer
4e367acd11 Created anvil-monitor-performance tool and daemon.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-12 22:58:36 -05:00
digimer
43f4201861 Created Get->load_average().
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-12 18:31:00 -05:00
digimer
f180f1adfc Added anvil-monitor-network to Makefile
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-02 22:30:17 -05:00
digimer
a3c3077963 Called pcs directly for CIB data collection.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-30 20:07:29 -05:00
digimer
f100bc94cd Bumped logging to debug striker-collect-debug
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-29 21:18:32 -05:00
digimer
050891d751 Fixed inverted file copy check logic.
This should resolve issue #534

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-28 00:12:13 -05:00
digimer
bb78d65c77 Bumped debug logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 21:36:54 -05:00
digimer
b85e38d20d Added the short and full host names to hosts
* Updated anvil_join_anvil->wait_for_etc_hosts() to add the short host
  name and the FQDN to the /etc/hosts file using the first BCN network's
  IP address.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
7018703c24 Fixed a bug where the peer subnode would add a server to pacemaker
* Updated anvil-provision-server to only call add_server_to_cluster() if
  it's NOT the peer.
* Added the new 'ok_if_exists' parameter to Cluster->add_server() to
  return 0 if the server already existed in pacemaker as a resource.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
14022896aa Added a call for non-striker machines to call check_sshd if no DBs.
Also added a check for sshd_config.d so that it doesn't error on EL8
machines.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
bf693ed212 Updated anvil-daemon to enable root SSH access during startup
This is required as we need to be able to ssh into peer strikers and
into nodes and DR hosts during initialization.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
943bf2e8d3 Removed the no-longer-needed Network->check_network() method
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
d8ceb7fbf4 Updated to add all subnode nets to /etc/hosts before forming cluster
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
760d9f53d8 Bumped logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
de4bb0d001 Bumped logging for debugging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
476b285607 Added a wait_for_access() function to anvil-join-anvil
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
023bcf46a4 Fixed a bug with hung cluster startup in some cases
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
dd0175e05c Now check for/backup/remove ifcfg-X files on EL8 hosts.
* Added caching to System->check_network_type()
* Changed anvil-configure-host job progress steps to 1.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
b0cede49e3 Removed calls to check apache config.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00