209 Commits (ff3d1983e38f11a3d83246e93f6b522c9b1a2cd3)

Author SHA1 Message Date
Digimer 0b41029db2 Reworked Database->_find_behind_databases to loop through tables, then databases when evaluating for resync. This is still racy but should be less racy as the time between counts of columns for a given table should be a lot shorter. Also re-enabled triggering resyncs based on the age of the most recent record. 3 years ago
Digimer 7212ea1c2f Fixed a bug where reaping db_in_use states wasn't restricted to the caller's host_uuid. 3 years ago
Digimer 74b7719cf5 * Created the new anvil-manage-host that can check/set if a host is configured. On Strikers, it can age out data, resync data, and check/set if the local database is active. 3 years ago
Digimer edf51adaec * Changed 'anvil-manage-power' to no longer set the job progress to 50 prior to calling a reboot. It now sets to 100 immediately. Also reduced the uptime timer to five minutes from ten. 3 years ago
Digimer 7b090e1623 * Updated Database->shutdown() to disconnect, stop the postgresql daemon, then reconnect. 3 years ago
Digimer 3fd0db15bf * This rather heavily reworks how database shutdowns works. It adds much more intelligent shutdown, tracking who is using the database, being able to mark a database as "offline" and waiting for users of the database to disconnect before it shuts down. 3 years ago
Digimer b234b79544 Updated anvil-daemon to check if anvil-sync-shared is running if the reported RAM use is too high. If so, it doesn't exit. This fixes an issue where anvil-sync-shared would loop forever as it would constantly be killed when downloading large files. 3 years ago
Digimer 68b1d12545 Updated anvil-daemon to not shutdown a striker DB until the striker host has been running for at least an hour. 3 years ago
Digimer f77f486775 Fixed a typo in scan-network 3 years ago
Digimer d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits. 3 years ago
Digimer a633ab7f63 Added a periodic check to ensure all users can ping. This fixes a bug where a local striker dashboard whose DB was stopped wouldn't work. 3 years ago
Digimer e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root. 3 years ago
Digimer 892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same. 3 years ago
Digimer 652f87ec74 * Updated scan-network to also clean up the media type. 3 years ago
Digimer 72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces. 3 years ago
Digimer 3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system. 3 years ago
Digimer 65dfc22a38 Added an eval{} call around Database->query()'s ->prepare() DBI call to better handle lost database handle. 3 years ago
Digimer 034c38fdeb Disabled calling striker-prep-database from the spec file, and enabled scancore. 3 years ago
Digimer 8e41814ca2 * Updated anvil-daemon->prep_database() to start the postgresql daemon if it's not running and no databases are available. 3 years ago
Digimer b517117bc1 * Did more work on trying to figure out why iniital setup of the database was failing. I believe it was because, in anvil-daemon, after calling 'prep_database' we called ->connect() _without_ 'check_if_configured' set. Next round of function testing should help confirm is this was the case. 3 years ago
Digimer 3445d008d2 Removed a stray debug die. 3 years ago
Digimer 63c45430bb * Updated scan-network to clear duplicate IP addresses. 3 years ago
Digimer e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading. 3 years ago
Digimer 4e9882812d * Fixed a bug where the periodic database dumps on the primary database Striker were not sync'ing to peers. Also fixed a bug where these periodic dumps weren't running at all. 3 years ago
Digimer 72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays. 3 years ago
Madison Kelly 922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time. 3 years ago
Digimer a697011b08 * Disabled debug logging in anvil-daemon. 4 years ago
Digimer 6777104398 * Fixed a bug in anvil-daemon where, when an anvil-manage-power reboot run had triggered a reboot, anvil-daemon didn't set the job_progress to '100', causing constant reboots. Also fixed a bug where the log level was hard-set to '1' instead of '2' needed during debugging. 4 years ago
Digimer 0c475d2a2e * Fixed a couple logging bugs. 4 years ago
Digimer d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers. 4 years ago
Digimer e7a06fce72 * Disabling the periodic network health check in anvil-daemon. 4 years ago
Digimer 30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues. 4 years ago
Digimer 47fa126a3c * Fixed a typo that blocked anvil-daemon from starting. 4 years ago
Digimer 023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly. 4 years ago
Digimer bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there. 4 years ago
Digimer c7c6c8dee5 * Reworked the attempt to repair the network in anvil-daemon to not touch the network until the machine has been running for at least two minutes. 4 years ago
Digimer 1e7847d4dd * Added a call to Network->check_bonds() to be called while non-Striker machines wait to connect to a database. 4 years ago
Digimer 3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on. 4 years ago
Digimer 19c41c9171 * Added more logging while chasing a function test bug. 4 years ago
Digimer daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often. 4 years ago
Digimer 96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged. 4 years ago
Digimer 24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time. 4 years ago
Digimer 4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped. 4 years ago
Digimer 8807915bb7 The theme of this commit is database cleanup and fixes. 4 years ago
Digimer 6abe06f125 The theme of these commits is improving DB responsiveness. 4 years ago
Digimer ff65712fd9 * Created the function check_daemons() in anvil-daemon to check that needed daemons are running when it starts. This was specifically added to address a periodic issue with machines booting without NetworkManager running. 4 years ago
Digimer 41cd1e0319 * Several bugs fixed and enhancements; 4 years ago
Digimer a846f9ecbc * Fix to the database resync logic. The previous change to only resync if 10+ lines differed broke striker-manage-peers as the difference in host counts is what triggered the pairing of strikers. 4 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer 3fb81c1a0a * Updated Convert->time() to silently return if the given time was '--'. 4 years ago