digimer
b7abc481e6
Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-08 20:30:25 -04:00
digimer-bot
ef84e63a7a
Merge pull request #333 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
2023-06-07 09:38:38 -04:00
digimer
c82bd9d73a
* Created the new anvil-watch-power tool that shows the status of UPSes known on the system, including their "on battery" state, charge percentage, estimated hold up time, etc.
...
* Updated Database->get_power() and ->get_upses() to store both the time stamp and unix time stamps.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 23:40:15 -04:00
digimer
5bb1c631cf
* Updated anvil-delete-server to accept '--server' and '--force' to allow direct deletion of a server without interacting with the menu system.
...
This partially addresses issue #321 .
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 16:23:28 -04:00
digimer
bc3d04ad2e
* Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state.
...
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 14:34:02 -04:00
Digimer
b3f3c9b24e
Merge pull request #332 from ClusterLabs/anvil-tools-dev
...
This commit addresses (hopefully) issue #329 .
2023-06-06 10:55:36 -04:00
digimer
0e57836c8f
This commit addresses (hopefully) issue #329 .
...
* Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now.
* Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster.
* Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-05 22:53:34 -04:00
Fabio M. Di Nitto
7cee742b67
Merge pull request #331 from ClusterLabs/digimer-patch-1
...
Delete notes
2023-06-02 15:59:39 +02:00
Digimer
711322f273
Delete notes
2023-06-02 13:32:15 +02:00
Digimer
91de6bb30e
Merge pull request #330 from ClusterLabs/anvil-tools-dev
...
* Fixes issue #329 ; When multiple attributes exist when checking if w…
2023-05-09 15:30:44 -04:00
digimer
284a2957d6
* Fixes issue #329 ; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference.
...
* Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations().
* Added a bit more logging for future debugging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-08 15:03:29 -04:00
Digimer
f3a65fc04d
Merge pull request #328 from ClusterLabs/anvil-tools-dev
...
This should resolve issue #271 .
2023-05-04 17:15:55 -04:00
digimer
8f375c58a9
* Fixed a typo in anvil-daemon that prevented compiling.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 11:14:23 -04:00
digimer
110dceb55e
* Added a check to make sure files were ready before provisioning a server.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 01:15:08 -04:00
digimer
c50a1936c0
* This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
...
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
Digimer
dfc0c2c492
Merge pull request #326 from ClusterLabs/anvil-tools-dev
...
* Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml…
2023-05-03 00:40:14 -04:00
digimer
26fa3c7e32
Fixed a bug where Get->available_resources() was missing LVM/storage group data in some cases.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-02 16:28:05 -04:00
digimer
510db70253
Another attempt to resolve the stoage group race condition. This moves the check for auto-assembly to scan-lvm. It only works for the first assemble, after that the user can/should use anvil-manage-storage-groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-02 00:07:40 -04:00
digimer
e483840ceb
Second attempt to fix the storage group race condition. This time, we only let node 1 assemble storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 20:29:20 -04:00
digimer
d64044c7d1
Test fix for storage group race condition.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 13:48:27 -04:00
digimer
1bba56a5b1
Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 10:54:51 -04:00
digimer
9a58f4d1ff
* This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-30 19:47:58 -04:00
digimer
895f1ec262
This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
...
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
e7537b0ca3
* Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml' and the output includes usage data, it breaks XML parsing.
...
* Fixed a bug in Get->available_resources() where DELETED servers were being counted in the used resources math.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-25 13:12:13 -04:00
digimer-bot
cc32d5b606
Merge pull request #320 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
2023-04-19 17:48:51 -04:00
digimer
c11be1ad1a
Added a skip to ignore dot files when looking at new files.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-19 12:36:05 -04:00
digimer
dc7b909bfc
More logging to debug storage group race condition
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:14:59 -04:00
digimer
bd575c6a7d
Bumped logging for storage group management.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:02:51 -04:00
digimer
0874ad571a
Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
8ba613952c
Typo fix.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 12:32:52 -04:00
digimer
83a527f4fa
* Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
89eae7098e
NOTE: This updates the reserved RAM to 8 GiB from 4 GiB!
...
* Adds support for 'anvil_resources:🐏 :reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏 :reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏 :reserved' option.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-17 20:43:28 -04:00
digimer
f086c1be39
Fixed a bug where the total RAM was shown instead of the free RAM.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 13:02:50 -04:00
digimer
fdf49c696f
Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 12:23:21 -04:00
digimer
c956f75406
Enabled anvil-safe-start in '%post node'.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 10:18:12 -04:00
digimer
025c2a6f54
* Updated Email->get_next_server() to ignore DELETED mail servers, and it now loads mail servers if not yet in memory.
...
This resolves issue #306 .
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-13 00:26:32 -04:00
digimer
fb70836126
This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-12 22:26:15 -04:00
Digimer
6bce292969
Merge pull request #319 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
2023-04-11 23:31:29 -04:00
digimer
83aa4e6a5f
Updated scan-cluster to check for FAILED resources (servers) and, if found, attempt to recover it.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-11 16:32:31 -04:00
digimer
1afa7ce09e
* Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state.
...
* Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-10 23:04:15 -04:00
digimer
f9689a7106
Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-10 17:40:46 -04:00
digimer
9bf0f50084
Added a check to see if the server's UUID exists and looping if not to prevent unitialized variable warnings.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-09 23:38:39 -04:00
Digimer
660f38ac16
Merge branch 'main' into anvil-tools-dev
2023-04-05 16:11:01 -04:00
digimer
cf73d8ed36
* Updated System->configure_ipmi() to auto-configure DR hosts once they've been assigned a BCN IP address.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-05 15:04:39 -04:00
Tsu-ba-me
567abff9de
fix(striker-ui): add manage UPS tab
2023-04-05 15:04:39 -04:00
Tsu-ba-me
759cd6f58a
fix(striker-ui): add form validation and message in ManageUpsPanel
2023-04-05 15:04:39 -04:00
Tsu-ba-me
2f84f52090
fix(striker-ui): passthrough input validation in EditUpsInputGroup
2023-04-05 15:04:39 -04:00
Tsu-ba-me
aa5aad4689
fix(striker-ui): add input validation to AddUpsInputGroup
2023-04-05 15:04:39 -04:00
Tsu-ba-me
d3894081f6
fix(striker-ui): add input tests to CommonUpsInputGroup
2023-04-05 15:04:39 -04:00
Tsu-ba-me
afdd376759
fix(striker-ui): correct validity test on first render in InputWithRef
2023-04-05 15:04:39 -04:00