510 Commits (f681f6f47a06c6f7d14615844ccfdd885c5590d9)

Author SHA1 Message Date
Digimer 623dbb0863 WIP; Restarted work on anvil-manage-server. 3 years ago
Digimer cebae28716 * WIP - Fixing a bug in scan-network where vnet devices aren't being recorded against their bridge. 3 years ago
Digimer 7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link. 3 years ago
Digimer fd5d3c0434 * Finished (though testing still needed) scan-network. 3 years ago
Digimer a697011b08 * Disabled debug logging in anvil-daemon. 3 years ago
Tsu-ba-me e8f18d7aa1 fix(cgi-bin): enable sending power on/off server VM jobs 3 years ago
Tsu-ba-me 2d1987f91a fix(cgi-bin): enable sending power on/off server VM jobs 3 years ago
Digimer 607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration. 3 years ago
Digimer d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers. 3 years ago
Digimer 30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues. 3 years ago
Digimer 023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly. 3 years ago
Digimer b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race. 3 years ago
Digimer 08a958ec60 * Finished updating Network->check_network() to check/heal bridges. 3 years ago
Digimer bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there. 3 years ago
Digimer 11b1900e1b Note: Continuing to resolve the build issues with network startup. Expect breakage. 3 years ago
Digimer 3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on. 3 years ago
Digimer 0b6a9e37fa * Added scan_lvm_pv_sector_size to the scan_lvm_pvs table in the scan-lvm. This will be used later for growing a requested disk size for the DRBD metadata. 3 years ago
Digimer 5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried. 3 years ago
Tsu-ba-me 9fda3af2ce fix(cgi-bin): move error string key to 0307 3 years ago
Digimer 24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time. 3 years ago
Tsu-ba-me 419ec52d2b fix(cgi-bin): add job title and description to set_membership 3 years ago
Digimer 73267a8ea9 * WIP - Slowly working on anvil-manage-server 3 years ago
Tsu-ba-me 1ec32bfbaf fix(cgi-bin): filled in logic to power on/off target host(s) 3 years ago
Tsu-ba-me 869a5ec807 fix(share): add error template 0304 for request body parse failure 3 years ago
Digimer 4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped. 3 years ago
Digimer 8807915bb7 The theme of this commit is database cleanup and fixes. 3 years ago
Digimer 6abe06f125 The theme of these commits is improving DB responsiveness. 3 years ago
Digimer bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host. 3 years ago
Digimer ff65712fd9 * Created the function check_daemons() in anvil-daemon to check that needed daemons are running when it starts. This was specifically added to address a periodic issue with machines booting without NetworkManager running. 3 years ago
Digimer 41cd1e0319 * Several bugs fixed and enhancements; 3 years ago
Digimer fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources. 4 years ago
Digimer 7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts. 4 years ago
Digimer f833c311ba * To address issues with scancore debugging, we needed a tool to purge old anvils and hosts from the database. The 'test.pl' in this commit contains the new logic that will be merged into tools/striker-purge-host shortly. 4 years ago
Digimer 4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches. 4 years ago
Digimer 416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node. 4 years ago
Digimer ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :) 4 years ago
Digimer 15dab8aab7 * Started working on the node post-scan login in ScanCore. Created ScanCore->check_temperature() to get a thermal score against a node. 4 years ago
Digimer f202187c34 * anvil-safe-stop is complete! Testing still needed, of course. 4 years ago
Digimer 3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). 4 years ago
Digimer 27259d1d53 * Finished anvil-rename-server! 4 years ago
Digimer 2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name. 4 years ago
Digimer 711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. 4 years ago
Digimer eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server. 4 years ago
Digimer a480357049 * Fixed a bug in Cluster->assemble_storage_groups() where, if a group is created during an anvil-provision-server run, the group would get created multiple times. 4 years ago
Digimer b36093671b * Updated Database queries that were passing 'debug => $debug' to not do that, as it was causing far too much (useless) noise in the logs. 4 years ago
Digimer 798518ba5e * While working on the boot/shutdown server tools, ran into and fixed a bug where files uploaded before an Anvil! was added could not have those files sync'ed. This was fixed though the new Database->check_file_locations() method. 4 years ago
Digimer e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added. 4 years ago
Digimer faf1399440 * Continued work on anvil-safe-start. Got it to the point where it detects shared networks with its peer node and waits for all networks to be up. 4 years ago
Digimer 15e71768a1 * Started work on anvil-safe-start. The enable/disable logic and how it runs automatically is controlled by the database and the tool can be used to control anvil-safe-start on both the local and peer node. It will be started by ScanCore, if scancore starts within 10 minutes of the node booting. It will always be able to run manually. 4 years ago
Digimer 942e0f66bf * Finished the 'get_X' enpoints so far defined. Added get_servers and completed get_status 4 years ago