Commit Graph

236 Commits

Author SHA1 Message Date
digimer
8c1c0597da Updated anvil-daemon to run anvil-configure-host in the foreground.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-30 14:49:02 -04:00
digimer
25a0454dce Better handling of lost DB connections.
* Added a sync call to Tools->nice_exit() to ensure logs are flushed.
* Updated Database->quote() to be in an eval block to better handle
  cases where the DB handle is lost.
* Added an hourly check to anvil-daemon and moved the memory in use
  check to run only once per hour.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 20:41:12 -04:00
digimer
b86493fff4 More logging to debug apparent hang
* Added an explicit 'sync' call when writing to logs. TO BE REMOVED!
* Disabled anvil-monitor-daemons and anvil-monitor-performance in case
  this is somehow trigging program exits.
* Converted prints to Log->entry calls in anvil-change-password
* Added PID state info logging for running jobs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 13:40:57 -04:00
digimer
ab33c716cb Created a specific check that there's a hosts entry for each DB
* This is meant to deal with a case where, when a DB is added to
  anvil.conf but that new entry is not yet in hosts, the program crashes
  because of a duplicate key when calling insert_or_update_hosts for all
  DBs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-25 20:19:26 -04:00
digimer
8e53993f67 Shortened the anvil-daemon job start up delay.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-15 23:00:31 -04:00
digimer
3e63b726d3 Added node 2 joining an Anvil! node if not started by node 1.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-14 01:36:28 -04:00
digimer
e00dec7cba Added loading existing corosync/authkey from peer during rebuild.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 17:46:19 -04:00
digimer
ab0b1a262b Reworked Network->wait_for_bonds() to be ->wait_for_networks()
* Renamed the old ->wait_for_networks() to be ->wait_for_nm_online().
* The new ->wait_for_networks() waits for all interfaces we manage to be
  'activated' before returning.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-29 01:32:32 -05:00
digimer
2f5fb32769 Quieted logging
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-28 16:37:20 -05:00
digimer
b8c73fd3f2 Replaced hosts management in anvil-join-anvil with System->update_hosts.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-26 18:29:55 -05:00
digimer
495cb90ca6 Created Network->wait_for_network to hold startup for NM to be up.
Added the call to Network->wait_for_network to pause scancore and
anvil-daemon startups until NetworkManager says it's up and running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 17:16:46 -05:00
digimer
5cf0bbc6be Added Want=NetworkManager to anvil-daemon and scancore unit files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 16:51:33 -05:00
digimer
05de34c7bc Scancore and anvil-daemon now holds for bonds to be up.
Created Network->wait_for_bonds(), and added it to the startup for
scancore and anvil-daemon.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:01:33 -05:00
digimer
741bcfa908 Added default logging level 2 and secure logging in CI tests.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-21 21:46:27 -05:00
digimer
5517e43a81 Forcing anvil-daemon to run with log level 2 and secure logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-15 23:04:11 -05:00
digimer
14022896aa Added a call for non-striker machines to call check_sshd if no DBs.
Also added a check for sshd_config.d so that it doesn't error on EL8
machines.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
bf693ed212 Updated anvil-daemon to enable root SSH access during startup
This is required as we need to be able to ssh into peer strikers and
into nodes and DR hosts during initialization.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
943bf2e8d3 Removed the no-longer-needed Network->check_network() method
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
b0cede49e3 Removed calls to check apache config.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
827cf1f331 Fixed a bug that was crashing anvil-daemon
* Network->find_matches() was trying to compare two IPs when the second
  IP wasn't actually defined.
* Disabled scancore's blocking of running before the host is configured.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
282fdbe7e0 Fixed a bug where IPs were being marked repeatedly as DELETEd.
* Database->get_ip_addresses() was marking IPs that weren't on a network
  we managed, the IP would be marked as DELETEd, which caused problems
  with initializing targets, and it generated a lot of repeat alerts.
* Updated logging in Network.pm to help with debugging.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
92ed77e05b Fixed a bug blocking most jobs from running.
* Also updated a bunch of 'apache' ownership calls to now use
  'striker-ui-api'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ff0e6c3575 Updated anvil-daemon to call scan-network if no interfaces exist.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
cad524db9d Removed anvil-update-states
* Created new anvil-monitor-network daemon to trigger scan-server via
  anvil-monitor-network on network events.
* Moved functionality into scan-network

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ec11335197 Fixed DB initialization bugs.
* More work done on the new network stack also.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
52e7875252 Bumoed logging to find '!!error!!' related parsing errors.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
3251154366 Updated anvil-daemon to run anvil-configure-host jobs when mapping net
Also fixed a bug in anvil-manage-host that prevented showing if the
network mapping flag was set.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-03 15:25:45 -04:00
digimer
4f6fa4b6ed Working on a bug where broken manifests are saved.
* Updated Striker->generate_manifest() to add pod and make the prefix,
  sequence and domain parameters required.
* Created the check_for_broken_manifests() function for anvil-daemon to
  detect/remove broken manifests.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-24 13:36:30 -04:00
digimer
9ee8f782ee Continuing to try to resolve duplicate variables bug.
* Added a called to Database->_check_for_duplicates to Database->resync_databases
* Added 'check_for_resync => 1' to anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-21 14:31:14 -04:00
digimer
2a3f0bab24 Reworked how and when duplicate variables are checked/cleared.
Moved the logic to a new private method, and call it now from the active
Striker in the once per minute loop. The duplicate variable issue seems
to be not entirely uncommon.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-21 13:33:14 -04:00
digimer
5ec395c53a Reworked DB resync logic.
With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time).

* Database->check_agent_data() no longer calls a resync after loading a
  schema.
* Removed the Database->coonnect() 'all' parameter
* The database used to read from is now always the same as the primary,
  even if there is a local DB.
* Database->connect() 'check_for_resync' parameter can now be set to
  '2', which means "check for resync _if_ I am primary", where '1' still
  checks for resync no matter what.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-19 20:41:57 -04:00
digimer
3d4d7abfe3 Increased logging to debug server install failure.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-29 13:01:58 -04:00
digimer
663a1e0527 Quieted screenshot logging in anvil-daemon.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-28 16:27:40 -04:00
digimer
fcbace6713 Updated anvil-join-anvil to hold if either node is still running anvil-configure-host
* Fixed a minor bug and added logging of maintenance_mode calls in anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-28 16:01:32 -04:00
digimer
c039c58128 * This commit moves taking screenshots of hosted servers onto the strikers using the Sys::Virt module. This was needed because the screenshots were being taken by scan-server, and that was causing it to take a long time to run. It should never have been handled by the scan agent anyway. This update requires a WebUI fix to use the new screenshot tool. This tool also adds holding multiple screenshots to allow users to "scrub" through screenshots up to 10 hours in the past.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-22 17:15:09 -04:00
digimer
d255adc7b4 * Updated anvil-daemon to set the mode of /mnt/shared/* to 0777 during creation and to check that that mode is set for existing sub-directories. This resolves issue #443.
* Cleaned up anvil-manage-dr.8 hyphen escapes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-17 22:14:40 -04:00
digimer
be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable.
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 22:32:41 -04:00
digimer
f57ab1a78c * Updated anvil-daemon to not hold jobs at startup is the host isn't configured yet.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 23:34:39 -04:00
digimer
66c82e5e22 * Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method).
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
e278de4b5a The main change in this commit deals with anvil-daemon startup. During OS updates, it would pick up the queued update job and run it while the other --no-db one was still running. This could become an issue for other tasks in the future, so updated anvil-daemon to not run any jobs for the first minute after startup. Also updated it to see if an OS update is underway (given how it can start mid-RPM update, before packages like kmod-drbd are ready to build). While doing this, implemented caching of daily tasks (like agine out data, archiving data, network scans, etc) to only run once per day, period. As it was before, they would always run on anvil-daemon startup, then wait 24 hours.
Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
d741f4aa6f * Updated anvil-daemon to not exit on high RAM use is any job is running.
* Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job,
* Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 22:23:30 -04:00
digimer
751687129a * Updated anvil-daemon to not exit on RAM use if anvil-update-system is running.
* Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online.
* Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed.
* Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 16:23:38 -04:00
Tsu-ba-me
4f46bb43eb fix(tools): remove server screenshot fetching in anvil-daemon 2023-07-13 01:54:04 -04:00
Tsu-ba-me
d95eb699f9 chore: disable web VNC, screenshot pieces to avoid libvirt deadlock 2023-07-05 17:06:11 -04:00
Tsu-ba-me
d98df4b2a4 fix(tools): isolate non-striker tasks in anvil-daemon 2023-07-03 04:46:06 -04:00
Tsu-ba-me
560d60c7e8 fix(tools): get server screenshots every minute and punt to strikers WIP 2023-07-03 04:46:06 -04:00
digimer
1d12fb32b4 * Completed the new anvil-watch-drbd which replaces watch_drbd.
* Updated Email->get_current_server() to always load mail server data from the database.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:43:46 -04:00
digimer
c9e11fbbfc * Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config.
* Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-19 21:44:45 -04:00
digimer
156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-16 11:43:49 -04:00
digimer
47f7a35df3 The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation.
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:13:53 -04:00