79 Commits (805a42b691c7b78d5a4df155e230eea71c6ba8e0)

Author SHA1 Message Date
digimer 26f4446bf9 Continued work on anvil-watch-servers; Parsed server data now. 11 months ago
Fabio M. Di Nitto fc75bda6ef ocf:alteeve:server: add support for log levels and bump timeouts 1 year ago
digimer a0ff080741 * Deleted some old unused code from Cluster->assemble_storage_groups(). 1 year ago
digimer ed480cf1cb * Fixed a double-$ bug in Remote->_check_known_hosts_for_target() 1 year ago
digimer d68adb5b4e * Updated anvil-manage-power to not reboot if anvil-version-changes is running (which, if it's taking time, is generating new kmods). 1 year ago
digimer 458cb267da * Fixed a bug in Cluster->get_primary_host_uuid() where servers were not loaded before trying to calculate RAM use. 1 year ago
digimer d56b7f9a84 * Created (but not finished!) the new striker-update-cluster tool. 1 year ago
digimer dda0fbd7d5 * Updated DRBD->allow_two_primaries() to be more careful at evaluating peer-node-id. 1 year ago
digimer b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1). 1 year ago
digimer b03587967b * Updated Cluster->add_server() to batch the creation of the server and the location constraints in one commit to the CIB. 1 year ago
digimer b7abc481e6 Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well. 1 year ago
digimer bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. 1 year ago
digimer e483840ceb Second attempt to fix the storage group race condition. This time, we only let node 1 assemble storage groups. 2 years ago
digimer d64044c7d1 Test fix for storage group race condition. 2 years ago
digimer 895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time. 2 years ago
digimer bd575c6a7d Bumped logging for storage group management. 2 years ago
digimer 1afa7ce09e * Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state. 2 years ago
digimer f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing. 2 years ago
digimer efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes. 2 years ago
digimer 9751c883cb * Updated Cluster->assemble_storage_groups() to remove refrences to anvil_dr1_host_uuid. Also added the logic for auto-adding DR host's VGs to a storage group. Commented it out though as, for now, this might be a bad idea. Needs more thought. 2 years ago
digimer e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups. 2 years ago
digimer 9d2f9c4d88 * Fixed a string key name typo. 2 years ago
digimer a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately. 2 years ago
Digimer 4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence. 2 years ago
Tsu-ba-me c413e62798 fix(striker-ui-api): pass Remote->test_access() user to Cluster->get_primary_host_uuid() 2 years ago
Digimer bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster. 2 years ago
Digimer d271ffec26 * Updated Cluster->parse_crm_mon() to record the role of stonith resources. 2 years ago
Digimer d8f31d9d84 * Added the anvil-boot-server man page. 2 years ago
Digimer 1e159f548e Added a couple notes for later dev. 3 years ago
Digimer 0c77736dc8 * Fixed a bug in Cluster->manage_fence_delay() where removing the 'delay="15"' attribute was failing, now set it to 0 instead. 3 years ago
Digimer 7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 3 years ago
Digimer d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers. 3 years ago
Digimer 5a343d6d75 * WIP; Started work on Cluster->check_server_constraints() that will track when a server's location constraint needs to be updated when the old preferred node is lost. 3 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 3 years ago
Digimer 80bdac8e34 * Updated the pacemaker server config to drop the stop timeout to 5 minutes and the migration timeout to 10 minutes. This will avoid blocking the entire cluster when a stop or migrate operation times out. Will update scan-server to clean these up when they happen. 3 years ago
Digimer 16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop. 3 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer 4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches. 4 years ago
Digimer 416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer 711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. 4 years ago
Digimer eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server. 4 years ago
Digimer a480357049 * Fixed a bug in Cluster->assemble_storage_groups() where, if a group is created during an anvil-provision-server run, the group would get created multiple times. 4 years ago
Digimer b36093671b * Updated Database queries that were passing 'debug => $debug' to not do that, as it was causing far too much (useless) noise in the logs. 4 years ago
Digimer e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added. 4 years ago
Digimer fb0836f912 * THe get_cpu endpoint was completed. 4 years ago
Digimer 5536e8ff47 * Updated Cluster->assemble_storage_groups() and Cluster->anvil_name_from_uuid() and ->available_resources() to try to detect the anvil_uuid if not passed in. 4 years ago
Digimer 0ec1bf6b6a * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run). 4 years ago