Commit Graph

4181 Commits

Author SHA1 Message Date
Madison Kelly
cb6346f468 Fixed errors that broke compile.
Signed-off-by: digimer <mkelly@alteeve.ca>
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-06 15:34:44 -04:00
Madison Kelly
4b82c5f2bf Added 'timeout' logging to help debug SIGALARM exits.
Signed-off-by: digimer <mkelly@alteeve.ca>
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-06 15:33:56 -04:00
digimer
cfa3432e78 Added a catch for SIGALARM
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-06-06 15:21:02 -04:00
Digimer
99eb177da2
Merge pull request #660 from ClusterLabs/net-config
Net config
2024-06-06 15:10:48 -04:00
Madison Kelly
5495a82595 Improved handling of lost DB connections.
* Updated Database->reconnect() to take 'lost_uuid' and, if passed,
  deletes the cached file handle before calling ->disconnect().
* Updated Database->query() to return an empty hash reference instead of
  '!!error!!', as almost always, callers do an array count, which
  triggered errors as it's not a hash reference. Updated docs to reflect
  this.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 20:26:18 -04:00
Madison Kelly
c00fd62ea6 Removed the lock release in Database->reconnect().
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 14:46:13 -04:00
Madison Kelly
d3ddbd395f Added logging for DB connection test bug
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 03:34:21 -04:00
Madison Kelly
52643885d2 Added a check to avoid deep recursions when testing DB access
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 01:16:45 -04:00
Madison Kelly
9cb2446bea Cleaned up handling of lost DB access
* Updated Database->query() to track when a specific DB to read from is
  passed. If so, and that is lost, return an error. If not, and another
  DB is available, switch to it.
* Updated Database->write() to skip trying to write to a lost DB.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-04 17:13:31 -04:00
Madison Kelly
9db9f81104 Reworked Database->_test_access to do a general reconnect
* Before, it would try to reconnect to just the lost DB, which could
  trigger an error.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-04 16:39:24 -04:00
Madison Kelly
574b2dccae Updated Database->query to better handle a lost DB connection.
* Created Database->reconnect to clean up reconnecting to the DBs

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-03 12:18:26 -04:00
digimer
8c1c0597da Updated anvil-daemon to run anvil-configure-host in the foreground.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-30 14:49:02 -04:00
digimer
f7082c930b Fixed a bug in parsing the fence agent for multi-device fence methods.
* Updated the fence_ipmilan timeouts to 30 seconds to help debug fence
  config failures.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-30 00:31:40 -04:00
digimer
25a0454dce Better handling of lost DB connections.
* Added a sync call to Tools->nice_exit() to ensure logs are flushed.
* Updated Database->quote() to be in an eval block to better handle
  cases where the DB handle is lost.
* Added an hourly check to anvil-daemon and moved the memory in use
  check to run only once per hour.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 20:41:12 -04:00
digimer
b86493fff4 More logging to debug apparent hang
* Added an explicit 'sync' call when writing to logs. TO BE REMOVED!
* Disabled anvil-monitor-daemons and anvil-monitor-performance in case
  this is somehow trigging program exits.
* Converted prints to Log->entry calls in anvil-change-password
* Added PID state info logging for running jobs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 13:40:57 -04:00
digimer
4766ceff70 Added logging to debug network config issue.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 00:35:27 -04:00
digimer
8dc3a8262f Updated pod on requiring 'new' for manifest_uuid when creating new
manifests.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 22:22:47 -04:00
digimer
566887462e Fixed parameter names being sent to Striker->generate_manifest().
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 20:23:52 -04:00
digimer
3c52d1e28e Changed how parameters are picked up in Striker->generate_manifest
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 18:16:32 -04:00
digimer
a3ac5cf7f8 Fixed a bug that prevented install manifests from being saved.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 14:27:16 -04:00
digimer
f08df75384 Made resync checks happen on any striker running for less than two
hours.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-27 23:27:27 -04:00
digimer
d6c5aa3903 Added a timeout to Database->query() calls.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-27 21:11:54 -04:00
digimer
368673eac2 Added a flag for when NM is changed and, if set, NM is restarted.
* Also bumped nmcli sleeps to 5s.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-27 00:07:18 -04:00
digimer
acf30229ef Added code to restart NetworkManager if needed
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-25 22:13:44 -04:00
digimer
b990d21dc3 Fixed a bug where migrations would needlessly fail memory checks.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-25 20:20:51 -04:00
digimer
ab33c716cb Created a specific check that there's a hosts entry for each DB
* This is meant to deal with a case where, when a DB is added to
  anvil.conf but that new entry is not yet in hosts, the program crashes
  because of a duplicate key when calling insert_or_update_hosts for all
  DBs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-25 20:19:26 -04:00
digimer
3d50f45984 Added a 1 second delay to nmcli calls
* Also fixed a bug Database->get_storage_group_data() to add a missing
  column to adding members.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-24 14:20:41 -04:00
digimer
033052f449 Shortened the time to reboot when no DBs come back after net reconfig
* Also updated to directly call a reboot.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-17 20:12:04 -04:00
digimer
8e53993f67 Shortened the anvil-daemon job start up delay.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-15 23:00:31 -04:00
digimer
6826b12188 Added a start for configured interfaces found to be down after boot.
* Added the 'up' parameter to Network->collect_data() that will bring up
  an interface we configured that is down.
* Updated scan-network to call Network->collect_data() with 'up' if the
  uptime is less than ten minutes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 13:25:56 -04:00
digimer
6d121dc0c0 Mapped each interface name in match.interface-name to a UUID lookup.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 00:17:16 -04:00
digimer
7925a3f42c * Added more man pages.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-07 22:34:18 -04:00
Fabio M. Di Nitto
9cfadcf096
Merge pull request #648 from ClusterLabs/fix-fence-opts-parsing
fence: do not load switches for deprecated agents options
2024-04-27 18:25:44 +02:00
Fabio M. Di Nitto
ef8bb19e60 fence: do not load switches for deprecated agents options
loading deprecated options causes switches to be overwritten during
xml parsing, generating incorrect pacemaker configs

Closes: https://github.com/ClusterLabs/anvil/issues/636

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2024-04-20 13:33:26 +02:00
Fabio M. Di Nitto
494e538257
Merge pull request #647 from ClusterLabs/fix-distcheck-as-user
build: fix make distcheck as user vs root
2024-04-20 13:14:15 +02:00
Fabio M. Di Nitto
def90f2daa build: fix make distcheck as user vs root
use proper autotool way to install / uninstall files

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2024-04-20 07:48:04 +02:00
Digimer
e4bd962715
Merge pull request #644 from ClusterLabs/upgrade-tools
Upgrade tools
2024-04-19 11:02:48 -04:00
digimer
5c3d1860c8 Made the host_key check conditional on an available DB
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-19 01:44:42 -04:00
digimer
9775612de7 Added an explicit check that IPs for a hostname are added in known_hosts
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-17 23:17:22 -04:00
digimer
1152c50f3a Added pcsd config, and -y support.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-16 00:43:58 -04:00
digimer
3e63b726d3 Added node 2 joining an Anvil! node if not started by node 1.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-14 01:36:28 -04:00
digimer
e00dec7cba Added loading existing corosync/authkey from peer during rebuild.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 17:46:19 -04:00
digimer
ec6acdd6d8 Reworked host validation to avoid warnings in logs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 17:14:08 -04:00
digimer
bd2e4c46ae Updated Network->load_ips() to use the device_name when available.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 16:55:18 -04:00
digimer
45e3a1e8a9 Updated Remote->_check_known_hosts_for_target() to replace updated keys
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 16:52:33 -04:00
digimer
9999d6f522 Fixed a bug where nics were not being found by their NM device name
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 01:54:09 -04:00
digimer
7ecd0a4d70 Starting work on rejoining a replacement subnode to an Anvil! node
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 01:35:20 -04:00
Digimer
84e321ff7d
Merge pull request #635 from ClusterLabs/tools-dev
Tools dev
2024-04-10 21:06:48 -04:00
digimer
863a7b1b07 Added missing data being recorded in crm_mon parser
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 17:10:52 -04:00
digimer
014136ddd0 Added manual parsing of crm_mon XML when parsing resource states.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00