85 Commits (805a42b691c7b78d5a4df155e230eea71c6ba8e0)

Author SHA1 Message Date
digimer 6bc2601d34 Updated anvil-manage-server-system to change boot device ordering. 11 months ago
digimer 081c5ea90e Possibly fixed the anvil-delete-server hang bug. 1 year ago
digimer 8c97f478a8 Updated Server->update_definition() to undefine a server when needed. 1 year ago
digimer 9b55504872 Updated anvil-manage-server-system to update defined servers. 1 year ago
digimer 0014cc591d Re-enabled DB connections in ocf:alteeve:server. 1 year ago
digimer 7545df1e55 Fixed a bug in which host runs an anvil-delete-server job. 1 year ago
digimer 55b1380031 Finished (but need more testing) of Server->locate(). 1 year ago
digimer 829ae546a2 Beginning work on new Server->locate() method to find servers across an 1 year ago
digimer f12e001ac2 Finished Server->connect_to_virsh(). 1 year ago
digimer 245f75de9b Added Server->update_definition() 1 year ago
digimer e361d0b424 More progress on anvil-manage-server-system 1 year ago
digimer 8925dabb9d * Updated anvil-shutdown-server to take the new '--immediate' switch which forces a server to shut down immediately (akin to pulling the power on a traditional machine). This is needed to allow a user to recover a crash or hung server. 1 year ago
digimer 4646f4a030 * Quieted logging. 1 year ago
digimer 580980717d This commit covers the convertion of 'virsh' shell calls to using 'Sys::Virt' module, and fixes several small bugs related to scan-server; 1 year ago
digimer 3ee30e6e24 * Updated DRBD->allow_two_primaries() to gracefully fail if the peer isn't connected. 1 year ago
Tsu-ba-me c46ff969f3 fix: add UUID to server process during find in Server.pm 1 year ago
Tsu-ba-me 4bdd206e0c fix: replace ps|grep with pgrep to reduce run time 1 year ago
digimer 7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available. 1 year ago
Tsu-ba-me 92a4027f9f fix: add UUID to server process during find in Server.pm 1 year ago
Tsu-ba-me 9aa2937929 fix: replace ps|grep with pgrep to reduce run time 1 year ago
Tsu-ba-me a7751da153 fix: rename, relocate function to find qemu-kvm processes 1 year ago
Tsu-ba-me c3c69733d9 fix: correct base port check, server info extract, vnc alive assign in Server.pm 1 year ago
Tsu-ba-me 3cce3c39b8 fix: add Server subroutine to extract server VM info from qemu-kvm process(es) 1 year ago
digimer bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. 1 year ago
digimer fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell. 2 years ago
digimer 7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks. 2 years ago
digimer 7773e5f9b8 * Updated logging in DRBD->get_devices(). 2 years ago
digimer a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately. 2 years ago
digimer c5fbf20615 * This inverts the --live logic on migrations in Server->migrate_virsh() to default to live. 2 years ago
digimer dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See: 2 years ago
Digimer e90dae96f7 * In Server->shutdown_virsh(), disabled trying to resume a paused VM. Also updated the logging around not waiting for a VM to stop. 2 years ago
Digimer 29a28ee97a * Fixed a bug with anvil-provision-server where running the command line menu from a Striker would not assign the job to the target Anvil!. 2 years ago
Digimer bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. 2 years ago
Digimer 4751c6e747 Updated DRBD->get_devices() and Server->parse_definition() to take 'anvil_uuid' so that server data can be parsed from anywhere. 3 years ago
Digimer 72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces. 3 years ago
Digimer 0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds. 3 years ago
Digimer e40d0e2444 Fixed a bug where if a database is pingable but the pgsql database is down, and it's the first database tested (or local), then the DB handle used to read / quote fails. 3 years ago
Digimer 8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf. 3 years ago
Digimer 28865780f8 * Updated Database->get_server_definitions() to take a specific server UUID, allowing just the one definition to be loaded. Also had it clear previous loads. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 3 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 3 years ago
Digimer daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often. 3 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 3 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer fb0836f912 * THe get_cpu endpoint was completed. 4 years ago
Fabio M. Di Nitto 8f9892650b [build] first pass at adding a build system to integrate with CI 4 years ago
Digimer 549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed). 4 years ago
Digimer 713f77bc78 * Finally finished scan-apc-ups! Proved way harder than anticipated... (over a solid week of work!) In M3, this agent is no longer host-bound, and the UPSes to scan based on entries in 'upses' using this scan agent. 4 years ago