530 Commits (462ec6d903843c0f457affde867b50b656bdb89d)

Author SHA1 Message Date
Digimer 213babaaf2 Trying to fix a bug where vnet devices keep reporting as having returned. 3 years ago
Digimer e40d0e2444 Fixed a bug where if a database is pingable but the pgsql database is down, and it's the first database tested (or local), then the DB handle used to read / quote fails. 3 years ago
Digimer 4c7bb45ab9 Fixed a race condition where configuring the IPMI BMC would appear to fail because the BMC wouldn't report the user list after a cold reset. 3 years ago
Digimer 6cbdc388d4 Fixed a bug where corosync's configuration of a backup ring was broken. 3 years ago
Digimer 04cb116c1b Updated anvil-parse-fence-agents to validate each fence agent's metadata is valid before adding it to the unified XML. 3 years ago
Digimer 8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf. 3 years ago
Digimer c449e2edf0 Resetting scan agent timeout to 30 seconds, 60 didn't help with a random 3 years ago
Digimer 15d8309095 This commit adds scan agent DB connection info caching to help minimize the number of unnecessary DB resync checks that happen. 3 years ago
Digimer 4800f7181f * Updated ScanCore to boot a node that is off without a stop reason. 3 years ago
Digimer acaacd9a86 * Created Storage->get_size_of_block_device() that takes a block device path and returns the size of the path, if it's found in the database. 3 years ago
Digimer 606bd8f1f0 Continuing work on anvil-manage-server. 3 years ago
Tsu-ba-me 92335b29cc fix(Anvil): add clean up logic when failed to validate Apache conf after modify 3 years ago
Tsu-ba-me d2d7a5380c fix(Anvil): search all args for Access-Control value 3 years ago
Tsu-ba-me 18ec7b1c1a fix(Anvil): abort when no new Apache conf created 3 years ago
Tsu-ba-me 3de9912f51 fix(Anvil): use augeas to modify Apache conf 3 years ago
Digimer 28865780f8 * Updated Database->get_server_definitions() to take a specific server UUID, allowing just the one definition to be loaded. Also had it clear previous loads. 3 years ago
Digimer ccd89f923b Fixed two small bugs that were preventing proactive live migration from working. 3 years ago
Digimer 548c52701a Updates Jobs->update_progress() to take a 'variables' hash reference, and to support logging as well. 3 years ago
Digimer 1e159f548e Added a couple notes for later dev. 3 years ago
Digimer 6db16ca313 * Fixed a bug in Database->insert_or_update_network_interfaces() where the passed-in network_interface_uuid parameter was not being set properly. 3 years ago
Digimer 0c77736dc8 * Fixed a bug in Cluster->manage_fence_delay() where removing the 'delay="15"' attribute was failing, now set it to 0 instead. 3 years ago
Digimer 7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link. 3 years ago
Digimer fd5d3c0434 * Finished (though testing still needed) scan-network. 3 years ago
Digimer d7d418ee1b * Fixed a bug in DRBD->gather_data() where the peer node's data was being recorded where the local node's data should have been saved. 3 years ago
Digimer 6777104398 * Fixed a bug in anvil-daemon where, when an anvil-manage-power reboot run had triggered a reboot, anvil-daemon didn't set the job_progress to '100', causing constant reboots. Also fixed a bug where the log level was hard-set to '1' instead of '2' needed during debugging. 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 3 years ago
Digimer 0c475d2a2e * Fixed a couple logging bugs. 3 years ago
Digimer d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers. 3 years ago
Digimer 87b31a16bb * Clear out the bond health in Network->check_network(). 3 years ago
Digimer 30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues. 3 years ago
Digimer 023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly. 3 years ago
Digimer 5a343d6d75 * WIP; Started work on Cluster->check_server_constraints() that will track when a server's location constraint needs to be updated when the old preferred node is lost. 3 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 3 years ago
Digimer 08a958ec60 * Finished updating Network->check_network() to check/heal bridges. 3 years ago
Digimer bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there. 3 years ago
Digimer 11b1900e1b Note: Continuing to resolve the build issues with network startup. Expect breakage. 3 years ago
Digimer a1b06e4355 * Continuing to try to get the network to reliably start during configuration... 3 years ago
Digimer 3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on. 3 years ago
Digimer 1a8215a783 * Fixed a bug in Network->get_ips() bridge detection bashlet. 3 years ago
Digimer 80bdac8e34 * Updated the pacemaker server config to drop the stop timeout to 5 minutes and the migration timeout to 10 minutes. This will avoid blocking the entire cluster when a stop or migrate operation times out. Will update scan-server to clean these up when they happen. 3 years ago
Digimer daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often. 3 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 3 years ago
Digimer 16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop. 3 years ago
Digimer 24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time. 3 years ago
Digimer 73267a8ea9 * WIP - Slowly working on anvil-manage-server 3 years ago
Digimer 78f3fb7b10 * Updated System->configure_ipmi to pull the machine from the anvils table instead of looking for the original job, which isn't useful now that we purge old jobs. 3 years ago
Digimer 4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped. 3 years ago
Digimer 8807915bb7 The theme of this commit is database cleanup and fixes. 3 years ago
Digimer 6abe06f125 The theme of these commits is improving DB responsiveness. 3 years ago
Digimer bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host. 3 years ago