86 Commits (76219f135d0f295a1076df45db8ee0ac0f6586b1)

Author SHA1 Message Date
digimer 3d50f45984 Added a 1 second delay to nmcli calls 7 months ago
digimer 863a7b1b07 Added missing data being recorded in crm_mon parser 9 months ago
digimer 014136ddd0 Added manual parsing of crm_mon XML when parsing resource states. 9 months ago
digimer f38b47f1e2 Reworked stonight levels; This should fix issue #522 and #613 9 months ago
digimer 2d92f339c2 Fixed a bug related to changing the hostname during a manifest run 9 months ago
digimer 7018703c24 Fixed a bug where the peer subnode would add a server to pacemaker 11 months ago
digimer 13a6c44aa4 Updated Cluster->parse_cib() to support new in_ccm and crmd values 11 months ago
digimer 26f4446bf9 Continued work on anvil-watch-servers; Parsed server data now. 1 year ago
Fabio M. Di Nitto fc75bda6ef ocf:alteeve:server: add support for log levels and bump timeouts 1 year ago
digimer a0ff080741 * Deleted some old unused code from Cluster->assemble_storage_groups(). 1 year ago
digimer ed480cf1cb * Fixed a double-$ bug in Remote->_check_known_hosts_for_target() 1 year ago
digimer d68adb5b4e * Updated anvil-manage-power to not reboot if anvil-version-changes is running (which, if it's taking time, is generating new kmods). 1 year ago
digimer 458cb267da * Fixed a bug in Cluster->get_primary_host_uuid() where servers were not loaded before trying to calculate RAM use. 1 year ago
digimer d56b7f9a84 * Created (but not finished!) the new striker-update-cluster tool. 1 year ago
digimer dda0fbd7d5 * Updated DRBD->allow_two_primaries() to be more careful at evaluating peer-node-id. 2 years ago
digimer b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1). 2 years ago
digimer b03587967b * Updated Cluster->add_server() to batch the creation of the server and the location constraints in one commit to the CIB. 2 years ago
digimer b7abc481e6 Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well. 2 years ago
digimer bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. 2 years ago
digimer e483840ceb Second attempt to fix the storage group race condition. This time, we only let node 1 assemble storage groups. 2 years ago
digimer d64044c7d1 Test fix for storage group race condition. 2 years ago
digimer 895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time. 2 years ago
digimer bd575c6a7d Bumped logging for storage group management. 2 years ago
digimer 1afa7ce09e * Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state. 2 years ago
digimer f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing. 2 years ago
digimer efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes. 2 years ago
digimer 9751c883cb * Updated Cluster->assemble_storage_groups() to remove refrences to anvil_dr1_host_uuid. Also added the logic for auto-adding DR host's VGs to a storage group. Commented it out though as, for now, this might be a bad idea. Needs more thought. 2 years ago
digimer e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups. 2 years ago
digimer 9d2f9c4d88 * Fixed a string key name typo. 2 years ago
digimer a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately. 2 years ago
Digimer 4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence. 2 years ago
Tsu-ba-me c413e62798 fix(striker-ui-api): pass Remote->test_access() user to Cluster->get_primary_host_uuid() 2 years ago
Digimer bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster. 2 years ago
Digimer d271ffec26 * Updated Cluster->parse_crm_mon() to record the role of stonith resources. 2 years ago
Digimer d8f31d9d84 * Added the anvil-boot-server man page. 2 years ago
Digimer 1e159f548e Added a couple notes for later dev. 3 years ago
Digimer 0c77736dc8 * Fixed a bug in Cluster->manage_fence_delay() where removing the 'delay="15"' attribute was failing, now set it to 0 instead. 3 years ago
Digimer 7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 4 years ago
Digimer d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers. 4 years ago
Digimer 5a343d6d75 * WIP; Started work on Cluster->check_server_constraints() that will track when a server's location constraint needs to be updated when the old preferred node is lost. 4 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 4 years ago
Digimer 80bdac8e34 * Updated the pacemaker server config to drop the stop timeout to 5 minutes and the migration timeout to 10 minutes. This will avoid blocking the entire cluster when a stop or migrate operation times out. Will update scan-server to clean these up when they happen. 4 years ago
Digimer 16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop. 4 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer 4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches. 4 years ago
Digimer 416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer 711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. 4 years ago