Commit Graph

760 Commits

Author SHA1 Message Date
digimer
284a2957d6 * Fixes issue #329; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference.
* Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations().
* Added a bit more logging for future debugging.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-08 15:03:29 -04:00
digimer
8f375c58a9 * Fixed a typo in anvil-daemon that prevented compiling.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 11:14:23 -04:00
digimer
110dceb55e * Added a check to make sure files were ready before provisioning a server.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 01:15:08 -04:00
digimer
c50a1936c0 * This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
digimer
1bba56a5b1 Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 10:54:51 -04:00
digimer
9a58f4d1ff * This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-30 19:47:58 -04:00
digimer
895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
c11be1ad1a Added a skip to ignore dot files when looking at new files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-19 12:36:05 -04:00
digimer
dc7b909bfc More logging to debug storage group race condition
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:14:59 -04:00
digimer
bd575c6a7d Bumped logging for storage group management.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:02:51 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
8ba613952c Typo fix.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 12:32:52 -04:00
digimer
83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
f086c1be39 Fixed a bug where the total RAM was shown instead of the free RAM.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 13:02:50 -04:00
digimer
fdf49c696f Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 12:23:21 -04:00
digimer
fb70836126 This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-12 22:26:15 -04:00
digimer
9bf0f50084 Added a check to see if the server's UUID exists and looping if not to prevent unitialized variable warnings.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-09 23:38:39 -04:00
digimer
1c274ba58d * Fixed a bug in anvil-delete-server that was preventing the complete deletion of a server if the DRBD resource had already been removed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-03 12:40:58 -04:00
digimer
ddc6965b60 * Fixed a bug where references to files on Anvil! nodes was broken in anvil-provision-server and anvil-manage-files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 17:33:49 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
digimer
8ff40ec42c * Fixed a SQL query bug in Database->get_drbd_data().
* Got more work done on anvil-manage-server-storage; Now shows DRBD resource size, backing LV and size, and calculates/displayes metadata size.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-26 02:09:52 -04:00
digimer
040bc02e26 * This adds the new Database->get_drbd_data() that, like ->get_lvm_data, collates the DRBD data collected by scan-drbd into more readibly parsable data structure.
* Updated DRBD->parse_resource() to add references to a resource name and volume for a given backing disk.
* Comtinued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-24 19:45:47 -04:00
Fabio M. Di Nitto
b75512e540 virt-install should not --wait on VM to be provisioned
Resolves: https://github.com/ClusterLabs/anvil/issues/277

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-03-24 01:27:15 -04:00
digimer
8e0e51544c * Continued work on anvil-manage-server-storage.
* Created the new Database->get_lvm_data to compile LVM data from scan-lvm
* Updated DRBD->parse_resource to call Database->get_lvm_data if needed, and to track backing devices to Storage Groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-22 22:57:26 -04:00
digimer
b144976853 This resolves Issue #310.
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-20 23:43:40 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
147f31aeeb * Added a loop when calling 'anvil-change-password' in a loop as there appears to be an unknown condition where during setup, this is called but never actually runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 18:37:13 -05:00
digimer
ab3e8afe6e Fixed a bug in Storage->push_file() where file path wasn't updated from incoming to files, preventing the push to other hosts from working. Also fixed a minor issue where the file size was sometimes 0, making transfer calculations useless.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 13:21:29 -05:00
Digimer
d59034a488
Merge branch 'main' into anvil-tools-dev 2023-02-22 02:21:50 -05:00
digimer
254f7ef4e2 This should fix the tracking of what files belong where, using the new DR links system. It also should finish (though testing is still needed) the serial rsync issue.
* Created Database->track_files() as a dedicated method as trying to verify the existence of file_locations during Database->load_anvils() was fragile and prone to recursive loops.
* Updated Database->insert_or_update_file_locations() to take an anvil_uuid and recursively call for each host, to maintain compatibility with the old ways, and make it simpler to add an entry for both sub-nodes in an Anvil!.
* Created Storage->push_file() that takes a file and rsync's it to all other machines, or creates a job for the file to be pulled if the target can't be accessed.
* Updated anvil-manage-files and anvil-sync-shared to use the new Storage->push_files and Database->track_files methods.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 02:13:19 -05:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
digimer
7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks.
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-03 22:05:34 -05:00
Tsu-ba-me
58d09cb08c fix(tools): make server screenshot write to named pipe non-blocking 2023-02-02 23:18:26 -05:00
Tsu-ba-me
eb561d6d39 fix(tools): always send to pipe when given request host 2023-02-02 19:21:56 -05:00
Tsu-ba-me
3802c72912 fix(tools): check server state before getting screenshot 2023-02-02 18:25:33 -05:00
Tsu-ba-me
a9cc123300 fix(tools): exit at end of anvil-get-server-screenshot 2023-02-02 17:11:56 -05:00
digimer
9751c883cb * Updated Cluster->assemble_storage_groups() to remove refrences to anvil_dr1_host_uuid. Also added the logic for auto-adding DR host's VGs to a storage group. Commented it out though as, for now, this might be a bad idea. Needs more thought.
* Fixed a bug in Database->get_storage_group_data() to load hosts data when needed. Also fixed a bug where new members didn't return the new storage_group_member_uuid.
* Updated anvil-manage-host to use the new switch handler.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-01 23:19:38 -05:00
digimer
7773e5f9b8 * Updated logging in DRBD->get_devices().
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-30 11:30:36 -05:00
digimer
56cf100b09 * Added a check to ensure a storage group actually exists before trying to present it to the user. This should resolve issue #299.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-22 11:08:55 -05:00
digimer
695b274d78 * Fixed a bug in anvil-provision-server wasn't loading the available OS list when provisioning servers. The should resolve issue #296.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 23:47:04 -05:00
digimer
053e5312e1 * Fixed a bug in anvil-manage-dr where protect jobs with multiple potential targets wouldn't know which to use during job runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:51:21 -05:00
digimer
e9f390b65b * Udated RPM spec to add new core requires and add calling 'anvil-version-changes' to core's %post.
* Added missing man pages and the new anvil-manage-storage-groups to the Makefile.am's.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:24:24 -05:00
digimer
e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups.
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:10:15 -05:00
digimer
355e5c2c0a * More work done on anvil-manage-dr. It now properly validated a dr host.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 00:11:35 -05:00
digimer
f8743a7435 * Further work on anvil-manage-dr. Now properly sanity checks that a valid server is passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-19 22:14:17 -05:00
digimer
1a217d21cf * Updated anvil-manage-dr to provide the ability to link anvil nodes to dr hosts. Also began work on making it work with the new DR links system.
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-19 19:41:02 -05:00
digimer
16fc4e131c * Fixed a bug where, if a specific request to do a DB resync was made but the active_uuid wasn't matching the host, it wouldn't resync. This broke peering Strikers when the peer source was not the active_uuid.
* Updated anvil-manage-dr to check and delete duplicate dr_link entries.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 22:53:15 -05:00
digimer
985338a064 Fixed typo that broke compilation.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 21:15:09 -05:00
digimer
98c3868870 * Updated fence_pacemaker to no longer use stonith_admin and instead use pcs. This should resolve the main part of issue #279
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 00:22:03 -05:00
digimer
0318b4bbe9 * Fixed (the very incomplete) anvil-manage-firewall so that it would clear a job, if a job was assigned to it.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-17 22:52:57 -05:00