827 Commits (7f9b418de1f2315bd054843c0eb85f34ce788c9f)

Author SHA1 Message Date
Digimer 80bdac8e34 * Updated the pacemaker server config to drop the stop timeout to 5 minutes and the migration timeout to 10 minutes. This will avoid blocking the entire cluster when a stop or migrate operation times out. Will update scan-server to clean these up when they happen. 4 years ago
Digimer 19c41c9171 * Added more logging while chasing a function test bug. 4 years ago
Digimer 0f43961568 * This commit lowers the logging levels of some debug log entries. It's to help diagnose occassional function test failures with an unknown source. 4 years ago
Digimer daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often. 4 years ago
Digimer 5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried. 4 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 4 years ago
Digimer e15c1651ed * Fixed a bug with deleting bad keys where jobs to delete keys on non-dashboard machine wasn't being assigned to the proper target machine. 4 years ago
Digimer 16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop. 4 years ago
Digimer 24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time. 4 years ago
Digimer 73267a8ea9 * WIP - Slowly working on anvil-manage-server 4 years ago
Digimer 4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped. 4 years ago
Digimer 8807915bb7 The theme of this commit is database cleanup and fixes. 4 years ago
Digimer 6abe06f125 The theme of these commits is improving DB responsiveness. 4 years ago
Digimer 49a700d68f * Fixed a bug in anvil-join-anvil where the desired DNS servers were not matching existing list of used DNS servers, even when they are the same already. 4 years ago
Digimer bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host. 4 years ago
Digimer ff65712fd9 * Created the function check_daemons() in anvil-daemon to check that needed daemons are running when it starts. This was specifically added to address a periodic issue with machines booting without NetworkManager running. 4 years ago
Digimer 42ffc200bc * Updated remainder pointers to the old repos to the new repos. Added support for the new alteeve-repo-setup. 4 years ago
Digimer 41cd1e0319 * Several bugs fixed and enhancements; 4 years ago
Digimer a846f9ecbc * Fix to the database resync logic. The previous change to only resync if 10+ lines differed broke striker-manage-peers as the difference in host counts is what triggered the pairing of strikers. 4 years ago
Digimer 41d528418d * Increased the trigger point for database archiving. The current values were too low and cuasing frequent archive -> resync cycles. 4 years ago
Digimer 48956d94fb * Fixed anvil-manage-system file name in Makefile. 4 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer ad4a1ecc78 * Increaded the scancore agent run timeout to 60 seconds. 4 years ago
Digimer 44864ce321 * Updated Database->resync_databases() to set a default schema of 'public'. Also fixed a bug where, when the difference in record numbers between two line was > 999, it would not trigger a resync. 4 years ago
Digimer 309aa13684 * Updated the name of striker-purge-target in the makefile. 4 years ago
Digimer 7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts. 4 years ago
Digimer f833c311ba * To address issues with scancore debugging, we needed a tool to purge old anvils and hosts from the database. The 'test.pl' in this commit contains the new logic that will be merged into tools/striker-purge-host shortly. 4 years ago
Digimer 3fb81c1a0a * Updated Convert->time() to silently return if the given time was '--'. 4 years ago
Digimer 4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches. 4 years ago
Digimer 416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Fabio M. Di Nitto 2214866156 Update to kmod-drbd91 4 years ago
Digimer f202187c34 * anvil-safe-stop is complete! Testing still needed, of course. 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer 27259d1d53 * Finished anvil-rename-server! 4 years ago
Digimer 2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name. 4 years ago
Digimer 711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. 4 years ago
Digimer eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server. 4 years ago
Digimer b36093671b * Updated Database queries that were passing 'debug => $debug' to not do that, as it was causing far too much (useless) noise in the logs. 4 years ago
Digimer 798518ba5e * While working on the boot/shutdown server tools, ran into and fixed a bug where files uploaded before an Anvil! was added could not have those files sync'ed. This was fixed though the new Database->check_file_locations() method. 4 years ago
Digimer e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added. 4 years ago
Digimer faf1399440 * Continued work on anvil-safe-start. Got it to the point where it detects shared networks with its peer node and waits for all networks to be up. 4 years ago
Digimer 15e71768a1 * Started work on anvil-safe-start. The enable/disable logic and how it runs automatically is controlled by the database and the tool can be used to control anvil-safe-start on both the local and peer node. It will be started by ScanCore, if scancore starts within 10 minutes of the node booting. It will always be able to run manually. 4 years ago
Digimer 5f0b7740e2 * Fixed a typo that broke compiling anvil-daemon in the last commit. Yay for CI/CD! 4 years ago
Digimer fb0836f912 * THe get_cpu endpoint was completed. 4 years ago
Digimer cd87c0f521 * Fixed a bug that caused striker-initialize-host to not compile / run. 4 years ago
Digimer 70dc0598f2 * Created Storage->manage_lvm_conf() that checks / updates lvm.conf to add a filter to avoid seeing DRBD devices as LVM components. This is now called from striker-initialize-host and scan-drbd. 4 years ago
Digimer 59b867cc25 * Updated DRBD->gather_data() to check if drbdadm exists before trying to call it to avoid scary errors in the logs. Also moved some strings that pulled from the scan-drbd agent into the main words file. 4 years ago
Digimer ec192d9041 * Fixed a bug where backing up a file on a remote machine returned a failure, if the target backup directory had to be created (even if it was created successfully). 4 years ago
Digimer 3ed857bacd * Bumped logging for striker-auto-initialize-all debugging. 4 years ago