Commit Graph

175 Commits

Author SHA1 Message Date
digimer
fef0d8f83f Fixed a spacing issue.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-27 21:54:05 -05:00
digimer
b8c73fd3f2 Replaced hosts management in anvil-join-anvil with System->update_hosts.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-26 18:29:55 -05:00
digimer
f40d25f2dd Fixed a bug with /etc/hosts generation
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-20 21:56:07 -05:00
digimer
091ded803c Added an attempt to assemble storage groups if not yet exist.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-15 17:41:26 -05:00
digimer
46b298e7e8 Added debugging of missing bond data
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-14 19:42:56 -05:00
Digimer
6c6216fb88
Merge branch 'main' into issues/561-correct-load-ifaces 2024-02-03 13:45:02 -05:00
digimer
e166d181c4 Disabled RAM counting as a debug step.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-02 23:42:36 -05:00
Tsu-ba-me
2306776c37 fix: rename load_interfces->load_interfaces of Network.pm 2024-01-30 16:16:14 -05:00
digimer
dd0175e05c Now check for/backup/remove ifcfg-X files on EL8 hosts.
* Added caching to System->check_network_type()
* Changed anvil-configure-host job progress steps to 1.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
827cf1f331 Fixed a bug that was crashing anvil-daemon
* Network->find_matches() was trying to compare two IPs when the second
  IP wasn't actually defined.
* Disabled scancore's blocking of running before the host is configured.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
92ed77e05b Fixed a bug blocking most jobs from running.
* Also updated a bunch of 'apache' ownership calls to now use
  'striker-ui-api'.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ec11335197 Fixed DB initialization bugs.
* More work done on the new network stack also.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
207a014ae0 Got anvil-watch-servers showing the status of subnodes.
* Updated System->maintenance_mode() to take 'host_uuid' so that the
  maintenance mode of remote machines can be checked/set.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-11-27 23:47:29 -05:00
digimer
49f194eac6 Fixed issue #515; anvil-join-anvil updates hostnames properly now
* Updated Get->host_name() to accept the new 'refresh' parameter. This
  forces a reread of the hostname, instead of using the cached value.
* Updated System->host_name() so that, when it's updating the hostname,
  it updates the database and cached variables.
* Updated Words->center_text() to avoid undefinied parameter issues.
* Updated anvil-join-anvil to ensure the 'sys::host_name' variable.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-27 19:58:37 -04:00
digimer
be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable.
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 22:32:41 -04:00
digimer
66c82e5e22 * Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method).
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available.
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 18:09:01 -04:00
digimer
cf73d8ed36 * Updated System->configure_ipmi() to auto-configure DR hosts once they've been assigned a BCN IP address.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-05 15:04:39 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
7891c9b2b1 * Fixed a bug in Network->load_ips() where interfaces were being marked as type 'bridge' or 'bond'.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-27 12:31:33 -05:00
digimer
26a1fe1491 * Updated Database->connect() to allow local reads on strikers, regardless of the active DB.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 15:47:34 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
Digimer
9194eb3d09 * Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost).
* Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-15 19:28:00 -05:00
Digimer
f6cbe7d1d2 * Fixed a bug in System->collect_ipmi_data() where double-quoted passwords were preventing reading of the sensor data.
* Added a new table to the main SQL schema to allow for more dynamic tracking of which Anvil! node pairs can use which DR hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-06 15:07:05 -05:00
Digimer
35cf0c37fb * Updated System->check_ram_use() to set the maximum RAM based on the host type, and set those values in _set_default() so that the user can override if they want.
* Got anvil-manage-alerts to the point where you can add, edit and delete mail servers.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-14 17:17:30 -05:00
Digimer
76f95d8e53 * Added a check to ignore IPMI sensors reporting "No Reading" in scan-ipmitool.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-24 14:38:21 -04:00
Digimer
d271ffec26 * Updated Cluster->parse_crm_mon() to record the role of stonith resources.
* Fixed a bug in System->parse_arguments() where a quoted password without spaces was returned without being recorded in the hash. Also updated logging to log 'suppressed' for passwords when secure logging is disabled.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-31 12:57:01 -04:00
Digimer
7fd6185445 * Disabled firewalling for now. There appears to be an issue starting up with DRBD.
* Updated Convert->time() to return whatever was passed in instead of '#!error!#'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-09 19:46:38 -04:00
Digimer
b2ea4f9adc * Moved System->manage_firewall() to Network->manage_firewall(). Started working on actually implementing it, which involves basically fully rewritting it.
* Updated tools/Makefile.am and scancore-agents/Makefile.am to add missing files.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 00:01:50 -04:00
Digimer
ab9b00a2f7 * Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons.
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-21 22:25:07 -04:00
Digimer
24f5d39dff This is a set of changes all stemming from trying to debug frequent resyncs. More bugs still to be fixed.
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-12 23:14:49 -04:00
Digimer
e6dcff1cf1 * Added a missing modified_date to ip_addresses in Database->get_ip_addresses().
* Updated scan-network to purge old historical ip_addresses when clearing duplicates now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-21 15:52:25 -04:00
Digimer
1b70b49cf8 * Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in.
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-20 10:28:21 -04:00
Digimer
d26a16e711 * Updated anvil-provision-server to handle human-readable sizes for disk and ram.
* Updated Database->get_anvils() to make it possible to translate a file name to a file UUID.
* Updated System->test_ipmi() to quote passwords properly. Also dropped the timeouts to 2 seconds.
* Updated anvil-provision-server to support pure CLI switch server provisioning using the --ci-test (and optional --options {--machine}) to allow CI tests.
* Continued work of anvil-manage-server.
* Fixed a bug in striker-prep-database to fix a bug in writing the pg_hba.conf file.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-10 00:42:40 -04:00
Digimer
e9a9e0dd4b * Finished (but needs more testing) the new 'anvil-report-usage' tool.
* Updated System->_check_anvil_conf() to create the 'admin' user in a more normal way (old way caused the 'admin' group to be a system GID.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-25 23:56:58 -04:00
Digimer
aa7d9bdf14 * Fixed a bug where resync'ing the database was missing tables.
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
74b7719cf5 * Created the new anvil-manage-host that can check/set if a host is configured. On Strikers, it can age out data, resync data, and check/set if the local database is active.
* Updated striker-prep-database to again enable the postgresql service.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits.
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-05 22:08:06 -05:00
Digimer
831abd1981 Updated Striker to allow the DR host to not have an IP assisned.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-21 16:57:10 -05:00
Digimer
e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-20 14:13:27 -05:00
Digimer
2c76103a96 Fixed a bug where, if the host IPMI BMC wouldn't allow spaces in the password and the user had a space, IPMI would never configure or get used as a fence method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-17 17:54:46 -05:00
Digimer
e8dcb8b24c Fixed a bug in System->configure_ipmi() where it would fail to find the IPMI BMC admin username in some cases.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-17 11:41:51 -05:00
Digimer
3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system.
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-07 20:03:39 -05:00
Digimer
8e436ffec7 WIP: Started work on a new Storage->copy_device() method that will do 'dd' calls.
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 17:58:20 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Digimer
221f468b6c * Fixed a bug where duplicate IPMI sensor names of type 'Volts' wasn't being processed properly, causing sensor data to not be recorded.
* Added a comment about disabling /etc/hosts management in anvil.conf.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-31 14:42:30 -04:00
Digimer
06f679d7e7 * Added the ability to disable anvil-daemon management of /etc/hosts.
* Updated the OS variant list.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-30 22:40:35 -04:00
Digimer
4c7bb45ab9 Fixed a race condition where configuring the IPMI BMC would appear to fail because the BMC wouldn't report the user list after a cold reset.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-25 21:02:00 -04:00
Digimer
6cbdc388d4 Fixed a bug where corosync's configuration of a backup ring was broken.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-24 15:52:44 -04:00