Commit Graph

622 Commits

Author SHA1 Message Date
digimer
8ce1f04335 Finished CPU support in anvil-manage-server-system!
* Updated Get->available_resources() to record the maximum cores that
  can be allocated to a server. This is N-1 for hosts with 4 or less
  cores, or N-2 cores otherwise.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-24 16:51:15 -05:00
digimer
b4037fade5 Added RAM change support to anvil-manage-server-system
* Updated Database->insert_or_update_servers() to error if the RAM being
  recorded is less than 640 KiB. This is because, somewhere yet
  undiscovered, the RAM is being recorded in KiB which breaks things.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-23 22:53:36 -05:00
digimer
6bc2601d34 Updated anvil-manage-server-system to change boot device ordering.
* Updated Server->parse_definition() to store a hash to translate a
  device target to a device type.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-23 19:22:46 -05:00
digimer
1e376fc06b Updated anvil-manage-dr to change --list to --show for consistency
* Updated anvil-manage-dr to handle DR hosts without a VG in a given SG
* Fixed up minor display issues in anvil-manage-storage-groups

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-17 23:52:58 -05:00
digimer
ef0f04117f Added '--list' to anvil-manage-dr
* Updated Database->get_hosts() to store hosts in a host_type hash.
* Updated Database->get_servers() to store servers by name, regardless
  of host Anvil! node.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-14 17:50:48 -05:00
digimer
d9aa8aee74 Updated anvil-manage-keys to work from the command line
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-13 18:59:24 -05:00
digimer
9bd98951b5 Added backup/restore of partition table in Storage->auto_grow_pv
* Auto-growing PVs (and the backing partition) is now supported by
  anvil-manage-host '--auto-grow-pv'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 20:59:40 -05:00
digimer
edc544255e Rebased with main and resolved conflicts.
This branch resolves issue #462; Auto growing PVs. Specifically, it looks at the LVM PVs on the host and checks to see if there is unused free space after the backing partition. If there is, it auto-grows the partition and then resizes the PV. This featu
re is designed to make life easier for users who deleted the auto-created '/home' partition during the anaconda disk partitioning tool.
* Created Storage->auto_grow_pv() that does the above.
* Added the missing hidden method name _create_rsync_wrapper in the Storage module POD.
* Added a call to Storage->auto_grow_pv() in anvil-configure-host and anvil-version-changes for nodes and DR.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 12:00:48 -05:00
digimer
e5e958a03b Updated anvil-configure-host to clear "config::map_network"
Also updated anvil-manage-host to properly display the network mapping
status.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-03 17:35:01 -04:00
digimer
8c97f478a8 Updated Server->update_definition() to undefine a server when needed.
* Boosted logging to debug anvil-delete-server hang in jenkins.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-01 16:06:59 -04:00
digimer
9b55504872 Updated anvil-manage-server-system to update defined servers.
* Updated Server->locate() to take the new 'anvil' parameter to speed up
  searches.
* Updated Server->update_definition() to use Server->locate() to find
  where updates are needed. It now also defines the server with the new
  config.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-01 00:15:13 -04:00
digimer
85f86cda8a Updated Storage->read_file() to test if the target file exists.
This allows for better logging if the file to be copied doesn't exist.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-25 21:21:52 -04:00
digimer
ed269aa450 Fixed a bug where duplicate variables were being created in some cases.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-20 22:05:33 -04:00
digimer
a35e790a4d Fixed a minor striker-boot-machine bug.
Divide by zero error when no hosts with IPMI found.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-20 11:38:15 -04:00
digimer
b1f89c2723 Finished initial version of striker-show-jobs
* Updated Database->get_jobs() to take 'job_host_uuid = all' to allow
  loading jobs from all cluster machines. Also updated it to record the
  'job_host_uuid' and the unix timestamp version of 'modified_date'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-18 20:51:20 -04:00
digimer
4398ffe70c Updated striker-boot-machine to support booting all machines.
* Wrote the man page for striker-boot-machine, changing --host-name to
  --host, and adding the '--host all' support.
* Updated anvil-manage-host to support checking/enabling/disabling
  network mapping mode.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-12 22:18:07 -04:00
digimer
55b1380031 Finished (but need more testing) of Server->locate().
This includes the changes in PR#492.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
f12e001ac2 Finished Server->connect_to_virsh().
* Now, connecting to virsh can detect when still-open connections
  already exist.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
245f75de9b Added Server->update_definition()
* This takes a server and new definition XML and updated the database and any available hosts. Does not yet update defined or running servers.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
62fe62a44b * Continued work on anvil-manage-server-system. It now displays the boot devices, CPU and RAM info.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
74ddb7f3a9 Updated Database-get_files() to detect/remove duplicate file entries.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-29 21:28:49 -04:00
digimer
fcbace6713 Updated anvil-join-anvil to hold if either node is still running anvil-configure-host
* Fixed a minor bug and added logging of maintenance_mode calls in anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-28 16:01:32 -04:00
digimer
582a8b292c Added more job updates to anvil-manage-power.
* This is a test to see if the job waiting for the uptime to be 300s,
  leaving the job_progress as 0, was causing the job to be repeatedly
  called.
* This is related to issue #479

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-28 01:19:52 -04:00
digimer
ef042eef25 Cleaned up logging while waiting for subnodes.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-28 00:15:14 -04:00
digimer
5d5270486e Added a wait loop when forming node clusters.
* This adds a check where anvil-join-anvil waits until both subnodes are
  marked as configured and not in maintenance mode.
* Should address issue #479 (maybe, this shouldn't trigger reboots, but
  it was certainly a race condition found while investigating).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-27 22:38:07 -04:00
Digimer
745081b649
Merge branch 'main' into patch-screenshot 2023-09-22 23:28:12 -04:00
digimer
c039c58128 * This commit moves taking screenshots of hosted servers onto the strikers using the Sys::Virt module. This was needed because the screenshots were being taken by scan-server, and that was causing it to take a long time to run. It should never have been handled by the scan agent anyway. This update requires a WebUI fix to use the new screenshot tool. This tool also adds holding multiple screenshots to allow users to "scrub" through screenshots up to 10 hours in the past.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-22 17:15:09 -04:00
digimer
8925dabb9d * Updated anvil-shutdown-server to take the new '--immediate' switch which forces a server to shut down immediately (akin to pulling the power on a traditional machine). This is needed to allow a user to recover a crash or hung server.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-21 18:56:12 -04:00
digimer
580980717d This commit covers the convertion of 'virsh' shell calls to using 'Sys::Virt' module, and fixes several small bugs related to scan-server;
* Switched all calls to virsh to use Sys::Virt to deal with contention of simultaneous virsh calls.
* Removed collecting screenshots from scan-server.
* Fixed a bad variable substitution in an alert.
* Fixed a bug where a server's boot time wasn't being recorded properly.
* Reworked how we determine which server definition was most recently updated and propogated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-21 15:59:43 -04:00
digimer
3c9086d1f3 Fixed bugs related to running jobs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-07 20:06:00 -04:00
digimer
e8a84e1c97 Added job handling to anvil-manage-server-storage (needs more testing though).
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-07 15:37:31 -04:00
digimer
2f429d2bc7 Fixed bugs related to adding drives and extending drives to servers.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-05 22:53:52 -04:00
digimer
e895e1f264 * Finished writting the anvil-manage-server-storage.
* Fixed handling --eject and --insert to work without a device target specified when only one exists, or to find the file path when only the file name is given.
* Updated anvil-manage-server-storage to show files when processing an optical devices without a file being passed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-05 16:53:08 -04:00
digimer
17078347ee Reworked anvil-manage-server-storage to use the translation system.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-03 16:02:47 -04:00
digimer
02de75a6ab * Improved log messaging to not log of a potential boot failure when the local DRBD volume(s) are all UpToDate and the peer is offline.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-14 12:58:00 -04:00
digimer
3ee30e6e24 * Updated DRBD->allow_two_primaries() to gracefully fail if the peer isn't connected.
* Updated DRBD->manage_resource() to check if the host is StandAlone when asked to 'up' a resource and, if so, connect first. Also updated this to error out gracefully if the call to allow_two_primaries() returns non-zero.
* Update Server->migrate_virsh() to error out gracefully if the DRBD->allow_two_primaries() returns non-zero.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-08 22:52:16 -04:00
digimer
88af919142 * Fixed bugs in ocf:alteeve:server
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-08 11:52:36 -04:00
digimer
6ee2ad75db * Updated anvil-delete-server to actively check for and delete any drbd-fenced attributes left over in the CIB after a server is deleted. This addresses issue #374.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 21:45:34 -04:00
digimer
be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable.
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 22:32:41 -04:00
digimer
d68adb5b4e * Updated anvil-manage-power to not reboot if anvil-version-changes is running (which, if it's taking time, is generating new kmods).
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 20:44:40 -04:00
digimer
66c82e5e22 * Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method).
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
e278de4b5a The main change in this commit deals with anvil-daemon startup. During OS updates, it would pick up the queued update job and run it while the other --no-db one was still running. This could become an issue for other tasks in the future, so updated anvil-daemon to not run any jobs for the first minute after startup. Also updated it to see if an OS update is underway (given how it can start mid-RPM update, before packages like kmod-drbd are ready to build). While doing this, implemented caching of daily tasks (like agine out data, archiving data, network scans, etc) to only run once per day, period. As it was before, they would always run on anvil-daemon startup, then wait 24 hours.
Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
b0c54b6dae * Updated anvil-update-system to check if another instance of anvil-update-system is running and, if so, exit.
* Removed the new tasks from anvil-special-operations.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 20:03:39 -04:00
digimer
7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available.
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 18:09:01 -04:00
digimer
9bc78860a6 * Updated anvil-update-system to detect kmod-drbd upgrade problems and fix them.
* Updated striker-update-cluster and anvil-update-system to take '--reboot' to request a reboot if any packages are updated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-16 20:45:47 -04:00
digimer
42b44ac864 * Updated the log showing why anvil-daemon isn't exiting when a job is running with the job's current progress.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-16 00:08:53 -04:00
digimer
d741f4aa6f * Updated anvil-daemon to not exit on high RAM use is any job is running.
* Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job,
* Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 22:23:30 -04:00
digimer
751687129a * Updated anvil-daemon to not exit on RAM use if anvil-update-system is running.
* Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online.
* Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed.
* Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 16:23:38 -04:00
digimer
3016fb875b * Reworded striker-update-cluster to use anvil-update-system for on-system OS updates.
* Updated DRBD->get_status() to take the new 'host' paramter to allow the caller to define the hash key string used in the stored data.
* Updated Get->anvil_version() (and a few other places) to use the new 'striker-ui-api' shell user, replacing the 'apache' user.
* Updated Remote->test_access() to take the new 'close' parameter to close the SSH session used when testing access to the target.
* Fixed a logging bug in anvil-manage-power.
* Updated anvil-update-system to take the '--no-reboot' and 'clear-cache' command line switches.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 22:29:07 -04:00
digimer
1b8b0bc493 * Created the new 'anvil-manage-server-storage' with the first role of reload a DRBD resource.
* Updated Remote->call() to remove the 'background' parameter as it wasn't working.
* Updated anvil-manage-server-storage to use 'anvil-manage-server-storage' to adjust resources in a way that doesn't block.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 21:02:30 -04:00
digimer
ea95d26cc5 * Fixed a bug in DRBD->get_next_resource() where reserved minor numbers were not being released. Also added a new parameter, "minor_only", that returns the next minor number but doesn't bother processing TCP ports.
* Did more work on adding support for adding new disk drives to servers in anvil-manage-server-storage.
* Updated anvil-manage-storage-groups To check for / delete duplicate storage groups with the same name.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-26 23:55:19 -04:00
digimer
88cc76914d This is an attempt to fix issue #341. It replaces the search for SN IPs from Network->find_matches() to Network->find_access(). The later of which doesn't care about the interface the IP was found on.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-24 21:24:37 -04:00
digimer
c9e11fbbfc * Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config.
* Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-19 21:44:45 -04:00
digimer
156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-16 11:43:49 -04:00
digimer
47f7a35df3 The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation.
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:13:53 -04:00
digimer
b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1).
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-11 23:46:21 -04:00
digimer
b7abc481e6 Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-08 20:30:25 -04:00
digimer
c82bd9d73a * Created the new anvil-watch-power tool that shows the status of UPSes known on the system, including their "on battery" state, charge percentage, estimated hold up time, etc.
* Updated Database->get_power() and ->get_upses() to store both the time stamp and unix time stamps.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 23:40:15 -04:00
digimer
0e57836c8f This commit addresses (hopefully) issue #329.
* Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now.
* Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster.
* Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-05 22:53:34 -04:00
digimer
110dceb55e * Added a check to make sure files were ready before provisioning a server.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 01:15:08 -04:00
digimer
c50a1936c0 * This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
digimer
895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
89eae7098e NOTE: This updates the reserved RAM to 8 GiB from 4 GiB!
* Adds support for 'anvil_resources:🐏:reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏:reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏:reserved' option.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-17 20:43:28 -04:00
digimer
f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-10 17:40:46 -04:00
digimer
cf73d8ed36 * Updated System->configure_ipmi() to auto-configure DR hosts once they've been assigned a BCN IP address.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-05 15:04:39 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
Fabio M. Di Nitto
a6f2c2271e Fix typo in log message
Resolves: https://github.com/ClusterLabs/anvil/issues/294

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-03-22 07:38:10 +01:00
digimer
b144976853 This resolves Issue #310.
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-20 23:43:40 -04:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
digimer
7773e5f9b8 * Updated logging in DRBD->get_devices().
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-30 11:30:36 -05:00
digimer
e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups.
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:10:15 -05:00
digimer
f8743a7435 * Further work on anvil-manage-dr. Now properly sanity checks that a valid server is passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-19 22:14:17 -05:00
digimer
1a217d21cf * Updated anvil-manage-dr to provide the ability to link anvil nodes to dr hosts. Also began work on making it work with the new DR links system.
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-19 19:41:02 -05:00
digimer
17863404e3 * Updated Database->_age_out_data() to only run once per day, unless explicitely called with --age-out-database.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 20:33:28 -05:00
digimer
ff69916a85 * Applied typo fixed from PR #286 (thanks, Deezzir!). Also moved all the raw prints into words.xml.
* Updated Convert->human_readable_to_bytes() to return an empty string if passed an empty string.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-16 20:23:29 -05:00
digimer
9d2f9c4d88 * Fixed a string key name typo.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 19:53:57 -05:00
digimer
b8b4352117 * Added support for Migration Network configs in old striker and anvil-configure-host
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:24:26 -05:00
digimer
b27a43eaf7 * Updated striker to only require 6 interfaces when configuring a node.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-14 16:59:41 -05:00
digimer
0fa6ddebc5 Updated scan-network to see an interface state of 'activated' as up (used to check specifically for 'active').
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-14 16:22:51 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
digimer
b666caec64 * Updated anvil-provision-server to handle startup when the peer doesn't create/connect it's DRBD resource (ie: node is offline).
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:00:38 -05:00
digimer
a5cee52153 * Fixed a bug in DRBD->get_devices() where old test host UUIDs were left hard-coded.
* Fixed a duplicate header in words.xml
* Fixed display bugs in anvil-report-usage and removed the old DR host display info.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 22:58:28 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
02e371ac56 Updated virsh OS list.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-06 18:08:52 -05:00
Digimer
f6cbe7d1d2 * Fixed a bug in System->collect_ipmi_data() where double-quoted passwords were preventing reading of the sensor data.
* Added a new table to the main SQL schema to allow for more dynamic tracking of which Anvil! node pairs can use which DR hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-06 15:07:05 -05:00
Digimer
4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence.
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.

Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.

Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-29 22:17:12 -05:00
Digimer
6eb99a2168 * FInished the anvil-manage-alerts tool. It can now send test alerts at a user-requested alert level.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 01:10:53 -05:00
Digimer
8b7a44cf75 * Finished cleaning up the output of Machines.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 00:19:00 -05:00
Digimer
3e53c87a6b Formatted the output of anvil-manage-alerts data (not yet machines) to be more presentable.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 23:28:50 -05:00
Digimer
622fb84652 * Renamed the 'notifications' table to 'alert-override', better reflecting what it does.
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 00:34:52 -05:00
Digimer
586ce6e5b9 * Got recipints working in anvil-manage-alerts().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-15 22:17:12 -05:00
Digimer
35cf0c37fb * Updated System->check_ram_use() to set the maximum RAM based on the host type, and set those values in _set_default() so that the user can override if they want.
* Got anvil-manage-alerts to the point where you can add, edit and delete mail servers.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-14 17:17:30 -05:00
Digimer
a6cd5c6604 * Starting work in the new anvil-manage-alerts, which will (when done), allow for management of mail servers, alert recipients, notification over-rides and to trigger test alerts.
* Updated Database->get_recipients() to record recipients by name for better sorting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-28 20:00:53 -04:00
Digimer
bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster.
* Created Get->anvil_from_switch and Get->server_from_switch() (both need testing) that takes a string that could be either a name or UUID, figures out which it is, finds the entry in the DB and started the X_uuid and X_name switch variables.
* Started work on a second attempt at anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-20 22:33:41 -04:00
Digimer
93427a7a38 * Updated Get->switches() to always support job-uuid.
* Updated striker-initialize-host to support calls from command line switches, and wrote the man page for it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 19:16:32 -04:00
Digimer
c23c79cdf0 Added 'system::all::configured' to anvil-join-anvil to mark an explicit end of config.
Started updating striker-initialize-host to handle the new anvil repo config.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 10:56:58 -04:00