Commit Graph

603 Commits

Author SHA1 Message Date
digimer
2d92f339c2 Fixed a bug related to changing the hostname during a manifest run
* The original hostname would be used to form the cluster, even though
  the hostname was updated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-18 23:28:42 -04:00
digimer
b3d1e53623 Added a log message if waiting for bonds times out.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 19:15:50 -05:00
digimer
a65bf5090e Updated anvil-monitor-performance to reduce logging volume.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 18:49:58 -05:00
digimer
c79b76e2ee Reduced the frequency of monitoring to 1/min and reduced log size
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 18:37:44 -05:00
digimer
6a193bf710 Added extra checks to Network->wait_for_bonds()
* Added a default timeout of 180 seconds, and updated
  anvil-configure-host to reduce this to 60 seconds while configuring
  the host.
* Added a check for interfaces configured under a bond. If none are
  found, the bond is ignored.
* Updated Storage->update_config() to take the new 'append' attribute to
  allow adding a variable if it wasn't found already in the config.
* Added the new 'network::wait_for_bonds::timeout' variable to enable
  changing the default timeout for Network->wait_for_bonds().

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 23:46:49 -05:00
digimer
d5ceca3dc6 Added a message when holding on a bond to activate.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:06:44 -05:00
digimer
741bcfa908 Added default logging level 2 and secure logging in CI tests.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-21 21:46:27 -05:00
digimer
f40d25f2dd Fixed a bug with /etc/hosts generation
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-20 21:56:07 -05:00
digimer
4b5894625e Updated anvil-configure-host to enable connection.autoconnect.
This closes issue #576

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 15:36:40 -05:00
digimer
e0c4ed6de5 Added log-only option to anvil-manage-daemons and enabled
anvil-monitor-daemons.service to only monitor daemons.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 13:55:03 -05:00
digimer
c399053ace Finished new anvil-manage-daemons tool.
This tool (and it's parent 'anvil-monitor-daemons' daemon) simplies
starting, stopping, enabling, and disabling all Anvil! daemons.

More importantly, the daemon will monitor for failed daemons and attempt
to restart them.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 01:39:56 -05:00
digimer
835d9e79cb Updated Scancore->post_scan_analysis_striker() to check the RC when
booting an unexpectedly off host and only update it's power state if the
boot actually succeeded.

* Started work on a new anvil-manage-daemons tool and
  anvil-monitor-daemons systemd unit.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-13 21:11:22 -05:00
digimer
27152845fd Attempts to create an existing fence method no longer fails.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-13 16:25:48 -05:00
digimer
4e367acd11 Created anvil-monitor-performance tool and daemon.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-12 22:58:36 -05:00
digimer
43f4201861 Created Get->load_average().
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-12 18:31:00 -05:00
digimer
f25323ba9b Fixed type on words variable insertion
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-02 19:43:36 -05:00
digimer
bf693ed212 Updated anvil-daemon to enable root SSH access during startup
This is required as we need to be able to ssh into peer strikers and
into nodes and DR hosts during initialization.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
d8ceb7fbf4 Updated to add all subnode nets to /etc/hosts before forming cluster
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
dd0175e05c Now check for/backup/remove ifcfg-X files on EL8 hosts.
* Added caching to System->check_network_type()
* Changed anvil-configure-host job progress steps to 1.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
e03219d1d8 Fixed a bug where non-strikers hung configuring their network.
* Updated Job->update_progress() to log and return if there are not DB
  connections.
* Bumped some logging in Database->connect().
* Deleted ifcfg code from anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
d7aa7966dc Fixed a couple bugs
* Network->collect_data() wasn't deleting old data before rescans.
* anvil-configure-host wasn't checking links that should be in a bond if
  the bond already existed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
518fddfa82 More progress on the new NM version of anvil-configure-host
* It's technically done, but I know bugs remain.
* Updated Jobs->update_progress() to take 'file' and 'line' to make it
  easier in the logs to see the origin of the message, when logging the
  update.
* Created Network->modify_connection() to update network manager
  variables. Created ->reset_connection() to take an interface down and
  bring it back up again.
* Fixed a bug in scan-network where the device_to_uuid hash wasn't being
  stored.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
71735947dc Created Job->bump_progress() to make advancing job progress easier
* Updated Network->collect_data() to find the GENERAL.DEVICES and
  GENERAL.IP-IFACE from match.interface-name when the link is down.
* More work done on anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ef89a79162 More progress on anvil-configure-host
* Now working on the reconfiguring of interfaces.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
a27773a69d scan-network now records interfaces, bonds and bridges!
* Much testing still needed, but this is a significant milestone.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
9c67b97fdd Fixed a bug in initializing DROP'ed DBs.
* Got more work done on adding network_interfaces to the database in
  scan-server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ec11335197 Fixed DB initialization bugs.
* More work done on the new network stack also.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
1f88abda04 * Further work done on anvil-monitor-network
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
fd880e2fdf Finished anvil-watch-servers
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-29 11:37:00 -05:00
digimer
26f4446bf9 Continued work on anvil-watch-servers; Parsed server data now.
* Updated Cluster->parse_cib() to store DRBD fence node restrictions by
  server/node. Also updated to make it easier to get the server's
  preferred node.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-29 01:27:21 -05:00
digimer
207a014ae0 Got anvil-watch-servers showing the status of subnodes.
* Updated System->maintenance_mode() to take 'host_uuid' so that the
  maintenance mode of remote machines can be checked/set.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-27 23:47:29 -05:00
digimer
8ce1f04335 Finished CPU support in anvil-manage-server-system!
* Updated Get->available_resources() to record the maximum cores that
  can be allocated to a server. This is N-1 for hosts with 4 or less
  cores, or N-2 cores otherwise.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-24 16:51:15 -05:00
digimer
b4037fade5 Added RAM change support to anvil-manage-server-system
* Updated Database->insert_or_update_servers() to error if the RAM being
  recorded is less than 640 KiB. This is because, somewhere yet
  undiscovered, the RAM is being recorded in KiB which breaks things.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-23 22:53:36 -05:00
digimer
6bc2601d34 Updated anvil-manage-server-system to change boot device ordering.
* Updated Server->parse_definition() to store a hash to translate a
  device target to a device type.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-23 19:22:46 -05:00
digimer
1e376fc06b Updated anvil-manage-dr to change --list to --show for consistency
* Updated anvil-manage-dr to handle DR hosts without a VG in a given SG
* Fixed up minor display issues in anvil-manage-storage-groups

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-17 23:52:58 -05:00
digimer
ef0f04117f Added '--list' to anvil-manage-dr
* Updated Database->get_hosts() to store hosts in a host_type hash.
* Updated Database->get_servers() to store servers by name, regardless
  of host Anvil! node.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-14 17:50:48 -05:00
digimer
d9aa8aee74 Updated anvil-manage-keys to work from the command line
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-13 18:59:24 -05:00
digimer
9bd98951b5 Added backup/restore of partition table in Storage->auto_grow_pv
* Auto-growing PVs (and the backing partition) is now supported by
  anvil-manage-host '--auto-grow-pv'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 20:59:40 -05:00
digimer
edc544255e Rebased with main and resolved conflicts.
This branch resolves issue #462; Auto growing PVs. Specifically, it looks at the LVM PVs on the host and checks to see if there is unused free space after the backing partition. If there is, it auto-grows the partition and then resizes the PV. This featu
re is designed to make life easier for users who deleted the auto-created '/home' partition during the anaconda disk partitioning tool.
* Created Storage->auto_grow_pv() that does the above.
* Added the missing hidden method name _create_rsync_wrapper in the Storage module POD.
* Added a call to Storage->auto_grow_pv() in anvil-configure-host and anvil-version-changes for nodes and DR.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 12:00:48 -05:00
digimer
e5e958a03b Updated anvil-configure-host to clear "config::map_network"
Also updated anvil-manage-host to properly display the network mapping
status.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-03 17:35:01 -04:00
digimer
8c97f478a8 Updated Server->update_definition() to undefine a server when needed.
* Boosted logging to debug anvil-delete-server hang in jenkins.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-01 16:06:59 -04:00
digimer
9b55504872 Updated anvil-manage-server-system to update defined servers.
* Updated Server->locate() to take the new 'anvil' parameter to speed up
  searches.
* Updated Server->update_definition() to use Server->locate() to find
  where updates are needed. It now also defines the server with the new
  config.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-01 00:15:13 -04:00
digimer
85f86cda8a Updated Storage->read_file() to test if the target file exists.
This allows for better logging if the file to be copied doesn't exist.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-25 21:21:52 -04:00
digimer
ed269aa450 Fixed a bug where duplicate variables were being created in some cases.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-20 22:05:33 -04:00
digimer
a35e790a4d Fixed a minor striker-boot-machine bug.
Divide by zero error when no hosts with IPMI found.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-20 11:38:15 -04:00
digimer
b1f89c2723 Finished initial version of striker-show-jobs
* Updated Database->get_jobs() to take 'job_host_uuid = all' to allow
  loading jobs from all cluster machines. Also updated it to record the
  'job_host_uuid' and the unix timestamp version of 'modified_date'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-18 20:51:20 -04:00
digimer
4398ffe70c Updated striker-boot-machine to support booting all machines.
* Wrote the man page for striker-boot-machine, changing --host-name to
  --host, and adding the '--host all' support.
* Updated anvil-manage-host to support checking/enabling/disabling
  network mapping mode.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-12 22:18:07 -04:00
digimer
55b1380031 Finished (but need more testing) of Server->locate().
This includes the changes in PR#492.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
f12e001ac2 Finished Server->connect_to_virsh().
* Now, connecting to virsh can detect when still-open connections
  already exist.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
245f75de9b Added Server->update_definition()
* This takes a server and new definition XML and updated the database and any available hosts. Does not yet update defined or running servers.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00