anvil

Commit Graph

Author	SHA1	Message	Date
digimer	3251154366	Updated anvil-daemon to run anvil-configure-host jobs when mapping net Also fixed a bug in anvil-manage-host that prevented showing if the network mapping flag was set. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	4f6fa4b6ed	Working on a bug where broken manifests are saved. * Updated Striker->generate_manifest() to add pod and make the prefix, sequence and domain parameters required. * Created the check_for_broken_manifests() function for anvil-daemon to detect/remove broken manifests. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	9ee8f782ee	Continuing to try to resolve duplicate variables bug. * Added a called to Database->_check_for_duplicates to Database->resync_databases * Added 'check_for_resync => 1' to anvil-configure-host. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	2a3f0bab24	Reworked how and when duplicate variables are checked/cleared. Moved the logic to a new private method, and call it now from the active Striker in the once per minute loop. The duplicate variable issue seems to be not entirely uncommon. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	5ec395c53a	Reworked DB resync logic. With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time). * Database->check_agent_data() no longer calls a resync after loading a schema. * Removed the Database->coonnect() 'all' parameter * The database used to read from is now always the same as the primary, even if there is a local DB. * Database->connect() 'check_for_resync' parameter can now be set to '2', which means "check for resync _if_ I am primary", where '1' still checks for resync no matter what. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	3d4d7abfe3	Increased logging to debug server install failure. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	663a1e0527	Quieted screenshot logging in anvil-daemon. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	fcbace6713	Updated anvil-join-anvil to hold if either node is still running anvil-configure-host * Fixed a minor bug and added logging of maintenance_mode calls in anvil-configure-host. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	c039c58128	* This commit moves taking screenshots of hosted servers onto the strikers using the Sys::Virt module. This was needed because the screenshots were being taken by scan-server, and that was causing it to take a long time to run. It should never have been handled by the scan agent anyway. This update requires a WebUI fix to use the new screenshot tool. This tool also adds holding multiple screenshots to allow users to "scrub" through screenshots up to 10 hours in the past. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	d255adc7b4	* Updated anvil-daemon to set the mode of /mnt/shared/* to 0777 during creation and to check that that mode is set for existing sub-directories. This resolves issue #443 . * Cleaned up anvil-manage-dr.8 hyphen escapes. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	be290bf561	This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable. * Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start. * Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not). Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	f57ab1a78c	* Updated anvil-daemon to not hold jobs at startup is the host isn't configured yet. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	66c82e5e22	* Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method). * Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key. * Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	e278de4b5a	The main change in this commit deals with anvil-daemon startup. During OS updates, it would pick up the queued update job and run it while the other --no-db one was still running. This could become an issue for other tasks in the future, so updated anvil-daemon to not run any jobs for the first minute after startup. Also updated it to see if an OS update is underway (given how it can start mid-RPM update, before packages like kmod-drbd are ready to build). While doing this, implemented caching of daily tasks (like agine out data, archiving data, network scans, etc) to only run once per day, period. As it was before, they would always run on anvil-daemon startup, then wait 24 hours. Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	d741f4aa6f	* Updated anvil-daemon to not exit on high RAM use is any job is running. * Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job, * Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	751687129a	* Updated anvil-daemon to not exit on RAM use if anvil-update-system is running. * Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online. * Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed. * Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
Tsu-ba-me	4f46bb43eb	fix(tools): remove server screenshot fetching in anvil-daemon	1 year ago
Tsu-ba-me	d95eb699f9	chore: disable web VNC, screenshot pieces to avoid libvirt deadlock	1 year ago
Tsu-ba-me	d98df4b2a4	fix(tools): isolate non-striker tasks in anvil-daemon	1 year ago
Tsu-ba-me	560d60c7e8	fix(tools): get server screenshots every minute and punt to strikers WIP	1 year ago
digimer	1d12fb32b4	* Completed the new anvil-watch-drbd which replaces watch_drbd. * Updated Email->get_current_server() to always load mail server data from the database. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	c9e11fbbfc	* Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config. * Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	156a0ca201	Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	47f7a35df3	The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation. * Fixed a small logging bug in DRBD->allow_two_primaries(). * Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded. * Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	8f375c58a9	* Fixed a typo in anvil-daemon that prevented compiling. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	c50a1936c0	* This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon. * Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	895f1ec262	This fixes a race condition when multiple servers are provisioned at (nearly) the same time. * In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers. * In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general. * Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	7710d9d109	* Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks. * Created DRBD->parse_resource() to pass a specific DRBD resource's XML data. * Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs. * Fixed bugs in Get->server_from_switch() where it just wasn't working properly. * Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	a3988cc3e5	* Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately. * Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes. * Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off. This commit address issue #279. Signed-off-by: digimer <digimer@gravitar.alteeve.com>	2 years ago
Digimer	6d59399c73	* Updated the short OS list. * Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options. * Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources. * Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	9194eb3d09	* Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost). * Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	f9ca6fb170	* This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates. * Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	622fb84652	* Renamed the 'notifications' table to 'alert-override', better reflecting what it does. * Got anvil-manage-alerts managing alert overrides. * Created, but for now commented out, the new 'audit' table. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	a4ef93404c	* Fixed a bug in DRBD->gather_data() to remove trailing commas for existing TCP ports. * Added the missing 'clear-mapping' switch to Get->switches in anvil-daemon. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	ef3ac86162	* Fixed a bug where setting the db_in_use flag without a valid $ENV{_}. * Added a nice_exit call to tools/striker-access-database Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	21738ab0d4	Added a bit more logging to the Database->mark_active method. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	a81478f2bc	* Updated 'db_in_use' state to add the caller's name to the state name. This is pulled out when logging stale locks that are being reaped, to help debug where stale locks are coming from. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	e7cf8ac789	* Got more work done on anvil-manage-files. It now picks up new files on nodes/dr hosts in an Anvil! and downloads them if needed. * Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	5fea8ff46a	* Adds the anvil-boot-server man page. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	b3b185a43c	* Added the alteeve-repo-setup man page and updated it to show that when called with '-h'. * Updated scancore to use the new Get->switches() list parameter. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	d9910fc951	Finished the man page for anvil-daemon. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	be612ff878	* Updated Get->switches() to take 'list' and 'man' parameters. With list, the passed in switches can be checked to ensure they're valid. With 'man', if set to the name of a man page (usually $THIS_FILE) will be displayed if --help, -h or -? are used. * Disabled striker-parse-oui until it can be reworked to store the the OUI data in a flat file instead of in the database. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	cd220e97dc	Disabled striker-prep-databas and set Database->configure_pgsql() calls to use debug => 2. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	7fd6185445	* Disabled firewalling for now. There appears to be an issue starting up with DRBD. * Updated Convert->time() to return whatever was passed in instead of '#!error!#'. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	bce9e2caaf	This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. * Added multiple new private methods to Network that help in managing the firewall. * Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible. * Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly. * Updated scan-server to check that the firewall is managed when a server state has changed. * Updated anvil-daemon to run Network->manage_firewall on startup. * Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.) * Removed firewall management from striker-prep-database. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	f2d06fa9b1	* Updated striker-parse-oui to only run if/when the system has been running for at least one hour. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	ab9b00a2f7	* Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons. * Updated scan-drbd to purge peer records that no longer have corresponding LVM data. * Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	911f7cfb6a	This is another big commit with a lot of DB work. Getting closer to sorting out the frequent resyncs. * Changes Database->connect to always use the first DB connected to, not the local one if that applies. This treats the first DB (sorted by UUID) as "primary" and the second (or third...) as more of a backup. * Moved db_in_use and lock_request to use the 'states' table instead of the variables table. These are set and removed so often that it was messing up things with resync's when the data is transient anyway. Fixed multiple bugs with both to better set and clear properly. * Created Database->read_state() to assist with the above changes. * Updated Database->refresh_timestamp() to specifically check that the returned time stamp differs from the previously used one, looping until they differ if needed. * Disabled striker-manage-install-target when called to update the repos, as the Install Target function doesn't work at this point. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	e6dcff1cf1	* Added a missing modified_date to ip_addresses in Database->get_ip_addresses(). * Updated scan-network to purge old historical ip_addresses when clearing duplicates now. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	1b70b49cf8	* Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in. * Updated Network->load_ips() to load extra information about the interfaces. * Updated ocf:alteeve:server to not check libvirtd daemon state on server start. * Updated scan-hardware to check for duplicate entries and purge if found. * Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh. * Updated scan-server to have better logging. * Created the new (and incomplete) anvil-test-alerts tool * Updated scancore to support --purge to pass to all agents and then exit. * Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago

1 2 3 4 5

210 Commits (997c501d6a88a3780812149d988addc58b622e04)