* Added a sync call to Tools->nice_exit() to ensure logs are flushed.
* Updated Database->quote() to be in an eval block to better handle
cases where the DB handle is lost.
* Added an hourly check to anvil-daemon and moved the memory in use
check to run only once per hour.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added an explicit 'sync' call when writing to logs. TO BE REMOVED!
* Disabled anvil-monitor-daemons and anvil-monitor-performance in case
this is somehow trigging program exits.
* Converted prints to Log->entry calls in anvil-change-password
* Added PID state info logging for running jobs.
Signed-off-by: digimer <mkelly@alteeve.ca>
* This is meant to deal with a case where, when a DB is added to
anvil.conf but that new entry is not yet in hosts, the program crashes
because of a duplicate key when calling insert_or_update_hosts for all
DBs.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Renamed the old ->wait_for_networks() to be ->wait_for_nm_online().
* The new ->wait_for_networks() waits for all interfaces we manage to be
'activated' before returning.
Signed-off-by: digimer <mkelly@alteeve.ca>
Added the call to Network->wait_for_network to pause scancore and
anvil-daemon startups until NetworkManager says it's up and running.
Signed-off-by: digimer <mkelly@alteeve.ca>
This is required as we need to be able to ssh into peer strikers and
into nodes and DR hosts during initialization.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Network->find_matches() was trying to compare two IPs when the second
IP wasn't actually defined.
* Disabled scancore's blocking of running before the host is configured.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Database->get_ip_addresses() was marking IPs that weren't on a network
we managed, the IP would be marked as DELETEd, which caused problems
with initializing targets, and it generated a lot of repeat alerts.
* Updated logging in Network.pm to help with debugging.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created new anvil-monitor-network daemon to trigger scan-server via
anvil-monitor-network on network events.
* Moved functionality into scan-network
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Striker->generate_manifest() to add pod and make the prefix,
sequence and domain parameters required.
* Created the check_for_broken_manifests() function for anvil-daemon to
detect/remove broken manifests.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a called to Database->_check_for_duplicates to Database->resync_databases
* Added 'check_for_resync => 1' to anvil-configure-host.
Signed-off-by: digimer <mkelly@alteeve.ca>
Moved the logic to a new private method, and call it now from the active
Striker in the once per minute loop. The duplicate variable issue seems
to be not entirely uncommon.
Signed-off-by: digimer <mkelly@alteeve.ca>
With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time).
* Database->check_agent_data() no longer calls a resync after loading a
schema.
* Removed the Database->coonnect() 'all' parameter
* The database used to read from is now always the same as the primary,
even if there is a local DB.
* Database->connect() 'check_for_resync' parameter can now be set to
'2', which means "check for resync _if_ I am primary", where '1' still
checks for resync no matter what.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.
Signed-off-by: digimer <mkelly@alteeve.ca>
Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job,
* Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online.
* Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed.
* Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.
Signed-off-by: digimer <mkelly@alteeve.ca>