533 Commits (b1f89c272389ddd770829dcacef428cca28b2d15)

Author SHA1 Message Date
digimer b1f89c2723 Finished initial version of striker-show-jobs 1 year ago
digimer 4398ffe70c Updated striker-boot-machine to support booting all machines. 1 year ago
digimer 55b1380031 Finished (but need more testing) of Server->locate(). 1 year ago
digimer f12e001ac2 Finished Server->connect_to_virsh(). 1 year ago
digimer 245f75de9b Added Server->update_definition() 1 year ago
digimer 62fe62a44b * Continued work on anvil-manage-server-system. It now displays the boot devices, CPU and RAM info. 1 year ago
digimer 74ddb7f3a9 Updated Database-get_files() to detect/remove duplicate file entries. 1 year ago
digimer fcbace6713 Updated anvil-join-anvil to hold if either node is still running anvil-configure-host 1 year ago
digimer 582a8b292c Added more job updates to anvil-manage-power. 1 year ago
digimer ef042eef25 Cleaned up logging while waiting for subnodes. 1 year ago
digimer 5d5270486e Added a wait loop when forming node clusters. 1 year ago
digimer c039c58128 * This commit moves taking screenshots of hosted servers onto the strikers using the Sys::Virt module. This was needed because the screenshots were being taken by scan-server, and that was causing it to take a long time to run. It should never have been handled by the scan agent anyway. This update requires a WebUI fix to use the new screenshot tool. This tool also adds holding multiple screenshots to allow users to "scrub" through screenshots up to 10 hours in the past. 1 year ago
digimer 8925dabb9d * Updated anvil-shutdown-server to take the new '--immediate' switch which forces a server to shut down immediately (akin to pulling the power on a traditional machine). This is needed to allow a user to recover a crash or hung server. 1 year ago
digimer 580980717d This commit covers the convertion of 'virsh' shell calls to using 'Sys::Virt' module, and fixes several small bugs related to scan-server; 1 year ago
digimer 3c9086d1f3 Fixed bugs related to running jobs. 1 year ago
digimer e8a84e1c97 Added job handling to anvil-manage-server-storage (needs more testing though). 1 year ago
digimer 2f429d2bc7 Fixed bugs related to adding drives and extending drives to servers. 1 year ago
digimer e895e1f264 * Finished writting the anvil-manage-server-storage. 1 year ago
digimer 17078347ee Reworked anvil-manage-server-storage to use the translation system. 1 year ago
digimer 02de75a6ab * Improved log messaging to not log of a potential boot failure when the local DRBD volume(s) are all UpToDate and the peer is offline. 1 year ago
digimer 3ee30e6e24 * Updated DRBD->allow_two_primaries() to gracefully fail if the peer isn't connected. 1 year ago
digimer 88af919142 * Fixed bugs in ocf:alteeve:server 1 year ago
digimer 6ee2ad75db * Updated anvil-delete-server to actively check for and delete any drbd-fenced attributes left over in the CIB after a server is deleted. This addresses issue #374. 1 year ago
digimer be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable. 1 year ago
digimer d68adb5b4e * Updated anvil-manage-power to not reboot if anvil-version-changes is running (which, if it's taking time, is generating new kmods). 1 year ago
digimer 66c82e5e22 * Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method). 1 year ago
digimer e278de4b5a The main change in this commit deals with anvil-daemon startup. During OS updates, it would pick up the queued update job and run it while the other --no-db one was still running. This could become an issue for other tasks in the future, so updated anvil-daemon to not run any jobs for the first minute after startup. Also updated it to see if an OS update is underway (given how it can start mid-RPM update, before packages like kmod-drbd are ready to build). While doing this, implemented caching of daily tasks (like agine out data, archiving data, network scans, etc) to only run once per day, period. As it was before, they would always run on anvil-daemon startup, then wait 24 hours. 1 year ago
digimer b0c54b6dae * Updated anvil-update-system to check if another instance of anvil-update-system is running and, if so, exit. 1 year ago
digimer 7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available. 1 year ago
digimer 9bc78860a6 * Updated anvil-update-system to detect kmod-drbd upgrade problems and fix them. 1 year ago
digimer 42b44ac864 * Updated the log showing why anvil-daemon isn't exiting when a job is running with the job's current progress. 1 year ago
digimer d741f4aa6f * Updated anvil-daemon to not exit on high RAM use is any job is running. 1 year ago
digimer 751687129a * Updated anvil-daemon to not exit on RAM use if anvil-update-system is running. 1 year ago
digimer 3016fb875b * Reworded striker-update-cluster to use anvil-update-system for on-system OS updates. 1 year ago
digimer 1b8b0bc493 * Created the new 'anvil-manage-server-storage' with the first role of reload a DRBD resource. 2 years ago
digimer ea95d26cc5 * Fixed a bug in DRBD->get_next_resource() where reserved minor numbers were not being released. Also added a new parameter, "minor_only", that returns the next minor number but doesn't bother processing TCP ports. 2 years ago
digimer 88cc76914d This is an attempt to fix issue #341. It replaces the search for SN IPs from Network->find_matches() to Network->find_access(). The later of which doesn't care about the interface the IP was found on. 2 years ago
digimer c9e11fbbfc * Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config. 2 years ago
digimer 156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early. 2 years ago
digimer 47f7a35df3 The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation. 2 years ago
digimer b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1). 2 years ago
digimer b7abc481e6 Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well. 2 years ago
digimer c82bd9d73a * Created the new anvil-watch-power tool that shows the status of UPSes known on the system, including their "on battery" state, charge percentage, estimated hold up time, etc. 2 years ago
digimer 0e57836c8f This commit addresses (hopefully) issue #329. 2 years ago
digimer 110dceb55e * Added a check to make sure files were ready before provisioning a server. 2 years ago
digimer 895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time. 2 years ago
digimer 0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try. 2 years ago
digimer 83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil. 2 years ago
digimer 89eae7098e NOTE: This updates the reserved RAM to 8 GiB from 4 GiB! 2 years ago
digimer f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing. 2 years ago