Commit Graph

255 Commits

Author SHA1 Message Date
Madison Kelly
cb6346f468 Fixed errors that broke compile.
Signed-off-by: digimer <mkelly@alteeve.ca>
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-06 15:34:44 -04:00
digimer
cfa3432e78 Added a catch for SIGALARM
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-06-06 15:21:02 -04:00
digimer
25a0454dce Better handling of lost DB connections.
* Added a sync call to Tools->nice_exit() to ensure logs are flushed.
* Updated Database->quote() to be in an eval block to better handle
  cases where the DB handle is lost.
* Added an hourly check to anvil-daemon and moved the memory in use
  check to run only once per hour.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 20:41:12 -04:00
digimer
e00dec7cba Added loading existing corosync/authkey from peer during rebuild.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 17:46:19 -04:00
Tsu-ba-me
5d086f5e79 fix(tools): log websockify output 2024-04-09 15:13:54 -04:00
Fabio M. Di Nitto
635f38b489 anvil-safe-stop: don´t use locked version of pcs
Add pcs_direct tool path and use it for anvil-safe-stop

Closes: #623

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2024-03-30 07:06:10 +01:00
digimer
21c8084b2f Updated to support Sys::Virt::Domain generating PNG screenshots
* This should work with older versions still generating PPM screenshots.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-26 18:56:07 -04:00
digimer
495cb90ca6 Created Network->wait_for_network to hold startup for NM to be up.
Added the call to Network->wait_for_network to pause scancore and
anvil-daemon startups until NetworkManager says it's up and running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 17:16:46 -05:00
digimer
05de34c7bc Scancore and anvil-daemon now holds for bonds to be up.
Created Network->wait_for_bonds(), and added it to the startup for
scancore and anvil-daemon.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:01:33 -05:00
digimer
741bcfa908 Added default logging level 2 and secure logging in CI tests.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-21 21:46:27 -05:00
digimer
a9850bef4e Added a global variable to force fresh SSH connections.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-16 23:28:58 -05:00
digimer
43f4201861 Created Get->load_average().
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-12 18:31:00 -05:00
digimer
14022896aa Added a call for non-striker machines to call check_sshd if no DBs.
Also added a check for sshd_config.d so that it doesn't error on EL8
machines.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
bf693ed212 Updated anvil-daemon to enable root SSH access during startup
This is required as we need to be able to ssh into peer strikers and
into nodes and DR hosts during initialization.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
cad524db9d Removed anvil-update-states
* Created new anvil-monitor-network daemon to trigger scan-server via
  anvil-monitor-network on network events.
* Moved functionality into scan-network

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
a038a1c553 Got anvil-monitor-network successfully renaming interfaces.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
f575507c1e This begins adding support for EL9.
* Added the 'hostname' and 'hostnamectl --transient' to
  Get->host_name().
* Updated Database->insert_or_update_hosts() to log when no host_name,
  host_type or host_uuid is not passed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
9bd98951b5 Added backup/restore of partition table in Storage->auto_grow_pv
* Auto-growing PVs (and the backing partition) is now supported by
  anvil-manage-host '--auto-grow-pv'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 20:59:40 -05:00
digimer
edc544255e Rebased with main and resolved conflicts.
This branch resolves issue #462; Auto growing PVs. Specifically, it looks at the LVM PVs on the host and checks to see if there is unused free space after the backing partition. If there is, it auto-grows the partition and then resizes the PV. This featu
re is designed to make life easier for users who deleted the auto-created '/home' partition during the anaconda disk partitioning tool.
* Created Storage->auto_grow_pv() that does the above.
* Added the missing hidden method name _create_rsync_wrapper in the Storage module POD.
* Added a call to Storage->auto_grow_pv() in anvil-configure-host and anvil-version-changes for nodes and DR.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-08 12:00:48 -05:00
digimer
68521cdab7 Updated striker-get-screenshots to set permissions properly.
This updates the /opt/alteeve/screenshot directories and the screenshots
in them to be readible by the WebUI.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
c039c58128 * This commit moves taking screenshots of hosted servers onto the strikers using the Sys::Virt module. This was needed because the screenshots were being taken by scan-server, and that was causing it to take a long time to run. It should never have been handled by the scan agent anyway. This update requires a WebUI fix to use the new screenshot tool. This tool also adds holding multiple screenshots to allow users to "scrub" through screenshots up to 10 hours in the past.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-22 17:15:09 -04:00
digimer
3c9086d1f3 Fixed bugs related to running jobs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-07 20:06:00 -04:00
digimer
cc71df686b Added a pcs wrapper to serialize pcs constraint calls.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-08 14:32:33 -04:00
digimer
59ade94124 * Added PID logging as an option, and enabled it in ocf:alteeve:server
* Updated DRBD->manage_resource() to take the task 'adjust'.
* Updated ocf:alteeve:server's start_drbd_resource() to call adjust if startup of a resource isn't needd.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-07 22:28:13 -04:00
Fabio M. Di Nitto
824e3e07e3 virsh: add wrapper to serialize calls to virsh list
avoid storm of virsh list that overloads libvirtd API causing
unnecessary timeouts during pcmk monitoring operations.

Resolves: https://github.com/ClusterLabs/anvil/issues/395

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-08-07 08:35:08 +02:00
Tsu-ba-me
9163cf4513 chore: remove anvil-manage-tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
25f5c38ade chore: rename striker-manage-vnc-pipes->anvil-manage-vnc-pipe 2023-07-23 21:43:26 -04:00
Tsu-ba-me
ea345a0476 fix: add ss, websockify paths to Tools.pm 2023-07-23 21:43:25 -04:00
Tsu-ba-me
711cb5b696 refactor: rename striker-open-ssh-tunnel->anvil-manage-tunnel 2023-07-23 21:43:25 -04:00
Tsu-ba-me
7f9b418de1 fix: add signal INT, TERM hooks to Tools.pm 2023-07-23 21:43:25 -04:00
digimer
7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available.
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 18:09:01 -04:00
Digimer
c1e4380a64
Merge branch 'main' into anvil-tools-dev 2023-07-15 00:06:49 -04:00
digimer
3016fb875b * Reworded striker-update-cluster to use anvil-update-system for on-system OS updates.
* Updated DRBD->get_status() to take the new 'host' paramter to allow the caller to define the hash key string used in the stored data.
* Updated Get->anvil_version() (and a few other places) to use the new 'striker-ui-api' shell user, replacing the 'apache' user.
* Updated Remote->test_access() to take the new 'close' parameter to close the SSH session used when testing access to the target.
* Fixed a logging bug in anvil-manage-power.
* Updated anvil-update-system to take the '--no-reboot' and 'clear-cache' command line switches.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 22:29:07 -04:00
Tsu-ba-me
3cce3c39b8 fix: add Server subroutine to extract server VM info from qemu-kvm process(es) 2023-07-12 02:27:26 -04:00
digimer
d56b7f9a84 * Created (but not finished!) the new striker-update-cluster tool.
* Updated Cluster->get_primary_host_uuid() to only load anvils if not already loaded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-07 17:54:57 -04:00
digimer
a7ebe45f76 This adds the new 'striker-collect-debug' tool that collects all potentially useful debug info into a single tarball.
* Fixed a bug in Get->anvil_from_switch() to work when the Anvil! name is passed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-05 21:04:05 -04:00
digimer
1b8b0bc493 * Created the new 'anvil-manage-server-storage' with the first role of reload a DRBD resource.
* Updated Remote->call() to remove the 'background' parameter as it wasn't working.
* Updated anvil-manage-server-storage to use 'anvil-manage-server-storage' to adjust resources in a way that doesn't block.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 21:02:30 -04:00
digimer
e0316da88b * Got anvil-manage-server-storage working enough to grow existing disk's hard drive sizes, and to insert/eject optical disks.
* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-23 23:09:55 -04:00
digimer
1d12fb32b4 * Completed the new anvil-watch-drbd which replaces watch_drbd.
* Updated Email->get_current_server() to always load mail server data from the database.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:43:46 -04:00
digimer
156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-16 11:43:49 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
98c3868870 * Updated fence_pacemaker to no longer use stonith_admin and instead use pcs. This should resolve the main part of issue #279
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 00:22:03 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
9194eb3d09 * Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost).
* Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-15 19:28:00 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
561fa1a9ec
Merge branch 'main' into anvil-tools-dev 2022-12-13 01:21:40 -05:00
Digimer
4fa8d7a446 * This completes the rework of DRBD triggered fencing to use / clear location constraints instead of triggering a power fence.
* Added the new unfence_pacemaker DRBD unfence handler.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-30 16:13:38 -05:00
Digimer
4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence.
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.

Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.

Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-29 22:17:12 -05:00