digimer
b6a249d5e7
* Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1).
...
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
digimer
929806cef7
Fixed variable substitution names in scan-server.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
digimer
b03587967b
* Updated Cluster->add_server() to batch the creation of the server and the location constraints in one commit to the CIB.
...
* Updated scan-lvm to look for and delete duplicate entries.
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
Digimer
133dbb121b
Merge pull request #334 from ClusterLabs/anvil-tools-dev
...
Updated scan-cluster to check to see that migrate_to and migrate_from…
1 year ago
digimer
b7abc481e6
Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
digimer-bot
ef84e63a7a
Merge pull request #333 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
1 year ago
digimer
c82bd9d73a
* Created the new anvil-watch-power tool that shows the status of UPSes known on the system, including their "on battery" state, charge percentage, estimated hold up time, etc.
...
* Updated Database->get_power() and ->get_upses() to store both the time stamp and unix time stamps.
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
digimer
5bb1c631cf
* Updated anvil-delete-server to accept '--server' and '--force' to allow direct deletion of a server without interacting with the menu system.
...
This partially addresses issue #321 .
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
digimer
bc3d04ad2e
* Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state.
...
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
Digimer
b3f3c9b24e
Merge pull request #332 from ClusterLabs/anvil-tools-dev
...
This commit addresses (hopefully) issue #329 .
1 year ago
digimer
0e57836c8f
This commit addresses (hopefully) issue #329 .
...
* Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now.
* Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster.
* Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered.
Signed-off-by: digimer <mkelly@alteeve.ca>
1 year ago
Fabio M. Di Nitto
7cee742b67
Merge pull request #331 from ClusterLabs/digimer-patch-1
...
Delete notes
1 year ago
Digimer
711322f273
Delete notes
1 year ago
Digimer
91de6bb30e
Merge pull request #330 from ClusterLabs/anvil-tools-dev
...
* Fixes issue #329 ; When multiple attributes exist when checking if w…
2 years ago
digimer
284a2957d6
* Fixes issue #329 ; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference.
...
* Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations().
* Added a bit more logging for future debugging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
Digimer
f3a65fc04d
Merge pull request #328 from ClusterLabs/anvil-tools-dev
...
This should resolve issue #271 .
2 years ago
digimer
8f375c58a9
* Fixed a typo in anvil-daemon that prevented compiling.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
110dceb55e
* Added a check to make sure files were ready before provisioning a server.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
c50a1936c0
* This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
...
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
Digimer
dfc0c2c492
Merge pull request #326 from ClusterLabs/anvil-tools-dev
...
* Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml…
2 years ago
digimer
26fa3c7e32
Fixed a bug where Get->available_resources() was missing LVM/storage group data in some cases.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
510db70253
Another attempt to resolve the stoage group race condition. This moves the check for auto-assembly to scan-lvm. It only works for the first assemble, after that the user can/should use anvil-manage-storage-groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
e483840ceb
Second attempt to fix the storage group race condition. This time, we only let node 1 assemble storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
d64044c7d1
Test fix for storage group race condition.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
1bba56a5b1
Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
9a58f4d1ff
* This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
895f1ec262
This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
...
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
e7537b0ca3
* Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml' and the output includes usage data, it breaks XML parsing.
...
* Fixed a bug in Get->available_resources() where DELETED servers were being counted in the used resources math.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer-bot
cc32d5b606
Merge pull request #320 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
2 years ago
digimer
c11be1ad1a
Added a skip to ignore dot files when looking at new files.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
dc7b909bfc
More logging to debug storage group race condition
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
bd575c6a7d
Bumped logging for storage group management.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
0874ad571a
Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
8ba613952c
Typo fix.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
83a527f4fa
* Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
89eae7098e
NOTE: This updates the reserved RAM to 8 GiB from 4 GiB!
...
* Adds support for 'anvil_resources:🐏 :reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏 :reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏 :reserved' option.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
f086c1be39
Fixed a bug where the total RAM was shown instead of the free RAM.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
fdf49c696f
Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
c956f75406
Enabled anvil-safe-start in '%post node'.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
025c2a6f54
* Updated Email->get_next_server() to ignore DELETED mail servers, and it now loads mail servers if not yet in memory.
...
This resolves issue #306 .
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
fb70836126
This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
Digimer
6bce292969
Merge pull request #319 from ClusterLabs/anvil-tools-dev
...
Anvil tools dev
2 years ago
digimer
83aa4e6a5f
Updated scan-cluster to check for FAILED resources (servers) and, if found, attempt to recover it.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
1afa7ce09e
* Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state.
...
* Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover.
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
f9689a7106
Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
digimer
9bf0f50084
Added a check to see if the server's UUID exists and looping if not to prevent unitialized variable warnings.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
Digimer
660f38ac16
Merge branch 'main' into anvil-tools-dev
2 years ago
digimer
cf73d8ed36
* Updated System->configure_ipmi() to auto-configure DR hosts once they've been assigned a BCN IP address.
...
Signed-off-by: digimer <mkelly@alteeve.ca>
2 years ago
Tsu-ba-me
567abff9de
fix(striker-ui): add manage UPS tab
2 years ago
Tsu-ba-me
759cd6f58a
fix(striker-ui): add form validation and message in ManageUpsPanel
2 years ago