71 Commits (2f429d2bc7bd41e3860552b72bfdae7801789f10)

Author SHA1 Message Date
digimer 3ee30e6e24 * Updated DRBD->allow_two_primaries() to gracefully fail if the peer isn't connected. 1 year ago
Tsu-ba-me c46ff969f3 fix: add UUID to server process during find in Server.pm 1 year ago
Tsu-ba-me 4bdd206e0c fix: replace ps|grep with pgrep to reduce run time 1 year ago
digimer 7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available. 1 year ago
Tsu-ba-me 92a4027f9f fix: add UUID to server process during find in Server.pm 1 year ago
Tsu-ba-me 9aa2937929 fix: replace ps|grep with pgrep to reduce run time 1 year ago
Tsu-ba-me a7751da153 fix: rename, relocate function to find qemu-kvm processes 1 year ago
Tsu-ba-me c3c69733d9 fix: correct base port check, server info extract, vnc alive assign in Server.pm 1 year ago
Tsu-ba-me 3cce3c39b8 fix: add Server subroutine to extract server VM info from qemu-kvm process(es) 1 year ago
digimer bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. 2 years ago
digimer fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell. 2 years ago
digimer 7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks. 2 years ago
digimer 7773e5f9b8 * Updated logging in DRBD->get_devices(). 2 years ago
digimer a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately. 2 years ago
digimer c5fbf20615 * This inverts the --live logic on migrations in Server->migrate_virsh() to default to live. 2 years ago
digimer dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See: 2 years ago
Digimer e90dae96f7 * In Server->shutdown_virsh(), disabled trying to resume a paused VM. Also updated the logging around not waiting for a VM to stop. 2 years ago
Digimer 29a28ee97a * Fixed a bug with anvil-provision-server where running the command line menu from a Striker would not assign the job to the target Anvil!. 2 years ago
Digimer bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. 3 years ago
Digimer 4751c6e747 Updated DRBD->get_devices() and Server->parse_definition() to take 'anvil_uuid' so that server data can be parsed from anywhere. 3 years ago
Digimer 72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces. 3 years ago
Digimer 0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds. 3 years ago
Digimer e40d0e2444 Fixed a bug where if a database is pingable but the pgsql database is down, and it's the first database tested (or local), then the DB handle used to read / quote fails. 3 years ago
Digimer 8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf. 3 years ago
Digimer 28865780f8 * Updated Database->get_server_definitions() to take a specific server UUID, allowing just the one definition to be loaded. Also had it clear previous loads. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 4 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 4 years ago
Digimer daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often. 4 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 4 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer fb0836f912 * THe get_cpu endpoint was completed. 4 years ago
Fabio M. Di Nitto 8f9892650b [build] first pass at adding a build system to integrate with CI 4 years ago
Digimer 549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed). 4 years ago
Digimer 713f77bc78 * Finally finished scan-apc-ups! Proved way harder than anticipated... (over a solid week of work!) In M3, this agent is no longer host-bound, and the UPSes to scan based on entries in 'upses' using this scan agent. 4 years ago
Digimer d677d19ca0 * Moved Database->check_condition_age to Alert. 4 years ago
Digimer 33101f969a * Fixed several bugs related to tracking server boots, migrations and shut downs in the anvil database. The 'ocf:alteeve:server' now has (mostly?) safe integration with the Anvil! database. This was mostly done by updating Servers->boot_virsh(), ->shutdown_virsh() and ->migrate_server(). 4 years ago
Digimer be88be6d30 * Did a bunch of testing / bugfixes for scan-server. 4 years ago
Digimer 262cbccb35 * Finished scan-server, though lots of testing needed. 4 years ago
Digimer 46f1a05789 * Got the code in scan-server to the point where it _should_ now gracefully and automatically detect changes to a server's definition originatin from the database (via Striker), directly editing the on-disk definition file, or editing via libvirt tools (like virt-manager). Still needs to be tested though. 4 years ago
Digimer e6e4c7d530 * Moved Server->_parse_definition() to -> parse_definition() to make it a publid method. 4 years ago
Digimer 4dfe0cb5a0 * Created Cluster->boot_server, ->shutdown_server and ->migrate_server methods that handle booting, migrating and shutting down servers. Also created the private method ->_set_server_constraint which is used by migrate and boot to set resource constraints to control where a server boots or migrates to. 4 years ago
Digimer 0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names). 4 years ago
Digimer fe7cdb18fb * Updated all methods to add (or fix) logging the method entry. 4 years ago
Digimer 767148b538 * Updated Database->get_mail_servers() to clear old stored data, and to pull out the list of when a mail server was last used. 4 years ago
Digimer 14bf323627 * Fixed an issue with ocf:alteeve:server where, after a migration, the target host would invoke the RA as if it was trying to migrate, instead of verifying the server (resource) was OK post migration. 4 years ago
Digimer 1498e1b53c * Got server migration working using ocf:alteeve:server in a test environment! 4 years ago
Madison Kelly 30f2b3fa8e * Switched all hash 'local' keys to be the host's short user name. Untested, likely bugs to be fixed in the next commit. 4 years ago
Digimer 47203490a9 * Working on getting live migration to work with ocf:anvil:striker using the environment variables that pacemaker sets. Incomplete, but getting close. 4 years ago