Commit Graph

21 Commits

Author SHA1 Message Date
digimer
de4bb0d001 Bumped logging for debugging.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
476b285607 Added a wait_for_access() function to anvil-join-anvil
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
a81a110261 * Remove forced log level and secure logging. This addresses issue #386
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-09 18:20:14 -04:00
digimer
0471fb90ea * Upped the logging in these three tools to help diagnose run errors. To be removed before tagging beta
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 13:07:38 -04:00
digimer
be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable.
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 22:32:41 -04:00
digimer
b0c54b6dae * Updated anvil-update-system to check if another instance of anvil-update-system is running and, if so, exit.
* Removed the new tasks from anvil-special-operations.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 20:03:39 -04:00
digimer
7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available.
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 18:09:01 -04:00
digimer
1d12fb32b4 * Completed the new anvil-watch-drbd which replaces watch_drbd.
* Updated Email->get_current_server() to always load mail server data from the database.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:43:46 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
Digimer
892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same.
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-18 02:38:50 -05:00
Digimer
32d47f70f1 * Fixed bugs around ScanCore->check_power() so that it now returns time on batteries and highest charge are returned properly.
* Created Network->is_our_interface() which returns '1' if an interface is one managed by an Anvil!. Also updated scan-network to use this to determine when an interface alert should be a warning or notice level alert.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-11 21:57:35 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
ad4a1ecc78 * Increaded the scancore agent run timeout to 60 seconds.
* Updated anvil-safe-start to start DRBD resources when the peer's DRBD resourcs is 'Connecting',
* Updated fence_pacemaker to more intelligently check the list of host names related to an IP address when looking for the peer host name

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-15 00:12:43 -04:00
Digimer
44864ce321 * Updated Database->resync_databases() to set a default schema of 'public'. Also fixed a bug where, when the difference in record numbers between two line was > 999, it would not trigger a resync.
* Updated the scan agent timeout to 60 seconds. Also made the scan agent exit code log entries more helpful.
* Updated System->collect_ipmi_data() to now better handle duplicate sensor names. Now, instead of simply appending an integer, we find the hex address and use that in the sensor name when duplicates exist. This solves the problem of the sensor names not being consistently shown in order.
* Fixed message bugs (bad variable insertions) in scan-apc-pdu and scan-apc-ups.
* Fixed schema procedure bugs in the 'temperature' and 'ip_address' tables where the columns were in bad order, causing constanty updates.

Incomplete work;
* Create the shell of 'anvil-manage-storage', but virtually no logic exists in it yet.
* Started work on anvil-safe-start to deal with an issue where DRBD resources don't start when a server is running on a peer.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-13 23:27:38 -04:00
Digimer
3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed).
* Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 00:04:20 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
Digimer
798518ba5e * While working on the boot/shutdown server tools, ran into and fixed a bug where files uploaded before an Anvil! was added could not have those files sync'ed. This was fixed though the new Database->check_file_locations() method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 22:56:18 -04:00
Digimer
e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added.
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 00:26:06 -04:00
Digimer
faf1399440 * Continued work on anvil-safe-start. Got it to the point where it detects shared networks with its peer node and waits for all networks to be up.
* Fixed a bug in scan-drbd where the volume_uuid wasn't being stored in the proper hash, breaking insertions into scan_drbd_peers in some cases.
* Updated System->pids() to work with remote targets (will be used later to check for parallel runs of anvil-safe-start).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 20:46:30 -04:00
Digimer
15e71768a1 * Started work on anvil-safe-start. The enable/disable logic and how it runs automatically is controlled by the database and the tool can be used to control anvil-safe-start on both the local and peer node. It will be started by ScanCore, if scancore starts within 10 minutes of the node booting. It will always be able to run manually.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 00:28:24 -04:00