* Updated Get->host_name() to accept the new 'refresh' parameter. This
forces a reread of the hostname, instead of using the cached value.
* Updated System->host_name() so that, when it's updating the hostname,
it updates the database and cached variables.
* Updated Words->center_text() to avoid undefinied parameter issues.
* Updated anvil-join-anvil to ensure the 'sys::host_name' variable.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a new table to the main SQL schema to allow for more dynamic tracking of which Anvil! node pairs can use which DR hosts.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in System->parse_arguments() where a quoted password without spaces was returned without being recorded in the hash. Also updated logging to log 'suppressed' for passwords when secure logging is disabled.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->get_anvils() to make it possible to translate a file name to a file UUID.
* Updated System->test_ipmi() to quote passwords properly. Also dropped the timeouts to 2 seconds.
* Updated anvil-provision-server to support pure CLI switch server provisioning using the --ci-test (and optional --options {--machine}) to allow CI tests.
* Continued work of anvil-manage-server.
* Fixed a bug in striker-prep-database to fix a bug in writing the pg_hba.conf file.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated System->_check_anvil_conf() to create the 'admin' user in a more normal way (old way caused the 'admin' group to be a system GID.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).
Signed-off-by: Digimer <digimer@alteeve.ca>
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the scan agent timeout to 60 seconds. Also made the scan agent exit code log entries more helpful.
* Updated System->collect_ipmi_data() to now better handle duplicate sensor names. Now, instead of simply appending an integer, we find the hex address and use that in the sensor name when duplicates exist. This solves the problem of the sensor names not being consistently shown in order.
* Fixed message bugs (bad variable insertions) in scan-apc-pdu and scan-apc-ups.
* Fixed schema procedure bugs in the 'temperature' and 'ip_address' tables where the columns were in bad order, causing constanty updates.
Incomplete work;
* Create the shell of 'anvil-manage-storage', but virtually no logic exists in it yet.
* Started work on anvil-safe-start to deal with an issue where DRBD resources don't start when a server is running on a peer.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added a new parameter to Database->connect() called 'no_resync' that, if set, prevents a resync check being performed. Updated ->resync_databases() to find a uuid_column where the table name ends in 'ies' and the UUID column is 'y_uuid'. Updated ->resync_databases() to not fire on updated table age anymore, and to trigger only if the number of rows differ in a given table by more than 10.
* Updated Log->entry() to prefix a tool's name, when the new 'log::scan_agent' value is set. Also set this value in ScanCore->agent_startup(), to help differentiate log entries.
* Fixed a bug in scancore's main loop where it logged the sleep message at the start of the run.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in scan-drbd where the volume_uuid wasn't being stored in the proper hash, breaking insertions into scan_drbd_peers in some cases.
* Updated System->pids() to work with remote targets (will be used later to check for parallel runs of anvil-safe-start).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Storage->parse_df that, shock!, parses 'df' output. Finished the long-ago started ->parse_lsblk as well.
* Created the shell of System->parse_lshw(), but there's no plans to finish it any time soon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Storage->parse_df that, shock!, parses 'df' output. Finished the long-ago started ->parse_lsblk as well.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>