94 Commits (77abfdc933f6d2884308179e0269e786fbe789ce)

Author SHA1 Message Date
digimer dda0fbd7d5 * Updated DRBD->allow_two_primaries() to be more careful at evaluating peer-node-id. 2 years ago
digimer bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. 2 years ago
digimer 0e57836c8f This commit addresses (hopefully) issue #329. 2 years ago
digimer 284a2957d6 * Fixes issue #329; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference. 2 years ago
digimer f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing. 2 years ago
digimer fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell. 2 years ago
digimer c5fbf20615 * This inverts the --live logic on migrations in Server->migrate_virsh() to default to live. 2 years ago
digimer dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See: 2 years ago
digimer b666caec64 * Updated anvil-provision-server to handle startup when the peer doesn't create/connect it's DRBD resource (ie: node is offline). 2 years ago
Digimer bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. 3 years ago
Digimer 1b70b49cf8 * Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in. 3 years ago
Digimer 1dbca79dde * Created Network->get_ip_from_mac() which takes a MAC address and returns an IP address. 3 years ago
Digimer 7023ffb56b Further improved startup DRBD logic in ocf:alteeve:server. Specifically, it will startup if a local resource/volume is sync'ing. 3 years ago
Digimer bc39c3fe5c Updated ocf:alteeve:server to better handle multi-peer DRBD configurations. 3 years ago
Digimer e62e5d7b0c * Updated ocf:alteeve:server to better handle starting up DRBD resources before trying to boot a VM. 3 years ago
Digimer 3c0435a455 * Updated ocf:alteeve:server to better handle starting up DRBD resources before trying to boot a VM. 3 years ago
Digimer e6d7ac7038 Fixed a bug in ocf:alteeve:server's new migration network support. 3 years ago
Digimer 0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 4 years ago
Digimer 5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried. 4 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 4 years ago
Digimer 16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop. 4 years ago
Digimer 24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time. 4 years ago
Digimer 7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts. 4 years ago
Digimer 4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches. 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer 2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name. 4 years ago
Digimer 711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. 4 years ago
Digimer eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server. 4 years ago
Fabio M. Di Nitto 8f9892650b [build] first pass at adding a build system to integrate with CI 4 years ago
Digimer 413a4f73c2 * Updated Tools->_anvil_version() and Get->anvil_version() to now pick up a SchemaVersion from anvil.sql. This will change only when the schema changes and is used when Database->connect() is checking compatibility with other anvil database hosts. This will make it only break connection when there is a reason to do so. The anvil_version still remains as an informational version that will help when supporting users later. 4 years ago
Digimer 549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed). 4 years ago
Digimer 05b1fccdb3 * Created Cluster->add_server() which, well, adds a server to a pacemaker cluster, including sorting out location constraints to favour the node the server is running on, if it's running. 4 years ago
Digimer 1d03a386d3 * Created Database->get_bridges() that, surprise, loads data from the 'bridges' table. 4 years ago
Digimer d677d19ca0 * Moved Database->check_condition_age to Alert. 4 years ago
Digimer 33101f969a * Fixed several bugs related to tracking server boots, migrations and shut downs in the anvil database. The 'ocf:alteeve:server' now has (mostly?) safe integration with the Anvil! database. This was mostly done by updating Servers->boot_virsh(), ->shutdown_virsh() and ->migrate_server(). 4 years ago
Digimer be88be6d30 * Did a bunch of testing / bugfixes for scan-server. 4 years ago
Digimer 46f1a05789 * Got the code in scan-server to the point where it _should_ now gracefully and automatically detect changes to a server's definition originatin from the database (via Striker), directly editing the on-disk definition file, or editing via libvirt tools (like virt-manager). Still needs to be tested though. 4 years ago
Digimer 4dfe0cb5a0 * Created Cluster->boot_server, ->shutdown_server and ->migrate_server methods that handle booting, migrating and shutting down servers. Also created the private method ->_set_server_constraint which is used by migrate and boot to set resource constraints to control where a server boots or migrates to. 4 years ago
Digimer 0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names). 4 years ago
Digimer b2c7fd95fb * Renamed the ScanCore unit file to scancore. 4 years ago
Digimer 14bf323627 * Fixed an issue with ocf:alteeve:server where, after a migration, the target host would invoke the RA as if it was trying to migrate, instead of verifying the server (resource) was OK post migration. 4 years ago
Digimer 1498e1b53c * Got server migration working using ocf:alteeve:server in a test environment! 4 years ago
Madison Kelly 30f2b3fa8e * Switched all hash 'local' keys to be the host's short user name. Untested, likely bugs to be fixed in the next commit. 4 years ago
Digimer 47203490a9 * Working on getting live migration to work with ocf:anvil:striker using the environment variables that pacemaker sets. Incomplete, but getting close. 4 years ago
Digimer cc1e0e2f77 * Updated ocf:alteeve:server to properly report a server's status when a monitor action is called. 4 years ago
Digimer 6eeb4e48c7 * Added 'timeout' and 'wfc-timeout' to drbd'd global-common.conf in DRBD->update_global_common(). 4 years ago
Digimer e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'. 4 years ago
Digimer 01974d7efe * Finished (though testing is needed) the updated ocf:alteeve:server resource agent. It now handles starting and stopping libvirtd and drbd daemons on-demand. 5 years ago
Digimer dcd1fd1492 * Created Cluster->check_node_status() that checks the status of a node (in pacemaker). 5 years ago