Commit Graph

850 Commits

Author SHA1 Message Date
Madison Kelly
3a41639baa Added the cleanup parameter to Database->disconnect()
* This was added so that, in Database->reconnect(), no attempt is made
  to update or disconnect from the DBs, preventing problems when the
  target DB was unexpectedly lost.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-10 17:33:16 -04:00
Madison Kelly
420445d875 Altered nmcli sleeps and bumped logging to debug DB connection issue.
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-10 15:04:54 -04:00
Madison Kelly
8b8be39717 Finished System->check_if_configured({thorough => 1}) support.
* Updated Database->get_variables() to store columns with
  variable_source_table values in a more useful hash.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-08 15:55:49 -04:00
Madison Kelly
94dacd08d8 Created Database->get_variables().
* Updated (NOT COMPLETE!) System->check_if_configured to take the new
  'thorough' parameter to see if the network is no longer configured.
  When used, the method attempts to detect if a host has been
  rebuilt and, thus, no longer configured.
* Started work on having 'anvil-join-anvil --rejoin' try to see if the
  network needs to be reconfigured prior to rejoining the cluster.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-08 14:18:28 -04:00
Madison Kelly
cb6346f468 Fixed errors that broke compile.
Signed-off-by: digimer <mkelly@alteeve.ca>
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-06 15:34:44 -04:00
Madison Kelly
4b82c5f2bf Added 'timeout' logging to help debug SIGALARM exits.
Signed-off-by: digimer <mkelly@alteeve.ca>
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-06 15:33:56 -04:00
Madison Kelly
5495a82595 Improved handling of lost DB connections.
* Updated Database->reconnect() to take 'lost_uuid' and, if passed,
  deletes the cached file handle before calling ->disconnect().
* Updated Database->query() to return an empty hash reference instead of
  '!!error!!', as almost always, callers do an array count, which
  triggered errors as it's not a hash reference. Updated docs to reflect
  this.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 20:26:18 -04:00
Madison Kelly
c00fd62ea6 Removed the lock release in Database->reconnect().
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 14:46:13 -04:00
Madison Kelly
d3ddbd395f Added logging for DB connection test bug
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 03:34:21 -04:00
Madison Kelly
52643885d2 Added a check to avoid deep recursions when testing DB access
Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-05 01:16:45 -04:00
Madison Kelly
9cb2446bea Cleaned up handling of lost DB access
* Updated Database->query() to track when a specific DB to read from is
  passed. If so, and that is lost, return an error. If not, and another
  DB is available, switch to it.
* Updated Database->write() to skip trying to write to a lost DB.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-04 17:13:31 -04:00
Madison Kelly
9db9f81104 Reworked Database->_test_access to do a general reconnect
* Before, it would try to reconnect to just the lost DB, which could
  trigger an error.

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-04 16:39:24 -04:00
Madison Kelly
574b2dccae Updated Database->query to better handle a lost DB connection.
* Created Database->reconnect to clean up reconnecting to the DBs

Signed-off-by: Madison Kelly <mkelly@alteeve.com>
2024-06-03 12:18:26 -04:00
digimer
f7082c930b Fixed a bug in parsing the fence agent for multi-device fence methods.
* Updated the fence_ipmilan timeouts to 30 seconds to help debug fence
  config failures.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-30 00:31:40 -04:00
digimer
25a0454dce Better handling of lost DB connections.
* Added a sync call to Tools->nice_exit() to ensure logs are flushed.
* Updated Database->quote() to be in an eval block to better handle
  cases where the DB handle is lost.
* Added an hourly check to anvil-daemon and moved the memory in use
  check to run only once per hour.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 20:41:12 -04:00
digimer
b86493fff4 More logging to debug apparent hang
* Added an explicit 'sync' call when writing to logs. TO BE REMOVED!
* Disabled anvil-monitor-daemons and anvil-monitor-performance in case
  this is somehow trigging program exits.
* Converted prints to Log->entry calls in anvil-change-password
* Added PID state info logging for running jobs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-29 13:40:57 -04:00
digimer
8dc3a8262f Updated pod on requiring 'new' for manifest_uuid when creating new
manifests.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 22:22:47 -04:00
digimer
566887462e Fixed parameter names being sent to Striker->generate_manifest().
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 20:23:52 -04:00
digimer
3c52d1e28e Changed how parameters are picked up in Striker->generate_manifest
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 18:16:32 -04:00
digimer
a3ac5cf7f8 Fixed a bug that prevented install manifests from being saved.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-28 14:27:16 -04:00
digimer
f08df75384 Made resync checks happen on any striker running for less than two
hours.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-27 23:27:27 -04:00
digimer
d6c5aa3903 Added a timeout to Database->query() calls.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-27 21:11:54 -04:00
digimer
ab33c716cb Created a specific check that there's a hosts entry for each DB
* This is meant to deal with a case where, when a DB is added to
  anvil.conf but that new entry is not yet in hosts, the program crashes
  because of a duplicate key when calling insert_or_update_hosts for all
  DBs.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-25 20:19:26 -04:00
digimer
3d50f45984 Added a 1 second delay to nmcli calls
* Also fixed a bug Database->get_storage_group_data() to add a missing
  column to adding members.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-24 14:20:41 -04:00
digimer
8e53993f67 Shortened the anvil-daemon job start up delay.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-15 23:00:31 -04:00
digimer
6826b12188 Added a start for configured interfaces found to be down after boot.
* Added the 'up' parameter to Network->collect_data() that will bring up
  an interface we configured that is down.
* Updated scan-network to call Network->collect_data() with 'up' if the
  uptime is less than ten minutes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 13:25:56 -04:00
digimer
6d121dc0c0 Mapped each interface name in match.interface-name to a UUID lookup.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 00:17:16 -04:00
Fabio M. Di Nitto
ef8bb19e60 fence: do not load switches for deprecated agents options
loading deprecated options causes switches to be overwritten during
xml parsing, generating incorrect pacemaker configs

Closes: https://github.com/ClusterLabs/anvil/issues/636

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2024-04-20 13:33:26 +02:00
digimer
5c3d1860c8 Made the host_key check conditional on an available DB
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-19 01:44:42 -04:00
digimer
9775612de7 Added an explicit check that IPs for a hostname are added in known_hosts
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-17 23:17:22 -04:00
digimer
ec6acdd6d8 Reworked host validation to avoid warnings in logs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 17:14:08 -04:00
digimer
bd2e4c46ae Updated Network->load_ips() to use the device_name when available.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 16:55:18 -04:00
digimer
45e3a1e8a9 Updated Remote->_check_known_hosts_for_target() to replace updated keys
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 16:52:33 -04:00
digimer
9999d6f522 Fixed a bug where nics were not being found by their NM device name
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 01:54:09 -04:00
digimer
863a7b1b07 Added missing data being recorded in crm_mon parser
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 17:10:52 -04:00
digimer
014136ddd0 Added manual parsing of crm_mon XML when parsing resource states.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00
digimer
76e66e6fa6 Added anvil.conf to log collection.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00
digimer
12bb45aa37 Added a secure check for password logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00
digimer
caf5e9550e Made lanplus default, secondary for Fujitsu only.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00
digimer
82341df508 Added logging the PID
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-10 10:39:17 -04:00
digimer
259febeb5c Added password changing for IPMI back in.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-06 23:10:30 -04:00
digimer
e9ed7ed4d4 Prevent IPMI IP change on simengine-backed IPMI.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-06 16:40:10 -04:00
digimer
b706ffa195 Cleaned up logging
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-06 16:39:53 -04:00
digimer
f38b47f1e2 Reworked stonight levels; This should fix issue #522 and #613
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-05 00:25:31 -04:00
digimer
f65f760c8a Improved Convert->to_ipmi_password()
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-04 13:39:56 -04:00
digimer
8f090a1f43 Set IPMI passwords to always be 16 char long and special chars removed.
* Created Convert->to_ipmi_password() which takes a password string,
  strips special characters, and shortens the results to 16 characters
  long. This should work with all v1.5, v2 and newer IPMI BMCs.
* Updated System->configure_ipmi() to remove the attempts to step-down
  the password to find one that fits the current IPMI host, now simply
  using the Convert->to_ipmi_password() password.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-03 22:22:24 -04:00
digimer
ad0a353a89 Fixed a bug where unused interfaces were not being ignored.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-25 22:41:34 -04:00
digimer
36525cdeab Added a work-around for an LVM JSON formatting issue
* Related to https://issues.redhat.com/browse/RHEL-29680
* Updated Storage->manage_lvm_conf() to be stricter about when to add
  the filter to lvm.conf

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-19 15:33:43 -04:00
digimer
2d92f339c2 Fixed a bug related to changing the hostname during a manifest run
* The original hostname would be used to form the cluster, even though
  the hostname was updated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-18 23:28:42 -04:00
digimer
870c990632 Added support for multiple IP's per interface
* Created Database->get_mac_to_ip()
* Updated Database->insert_or_update_mac_to_ip() to find an entry using
  both the IP and MAC address.
* Updated Network->get_ips() to store only the first IP it finds on an
  interface as the main IP (for use in /etc/hosts, etc) and to store it
  and any other IPs in a new hash.
* Updated scan-network to use the new hash above to record them in the
  'mac_to_ip' table. Similarly, before marking an IP as removed, it
  checks to see if it's an alternate IP.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-06 19:06:05 -05:00