* Updated striker-auto-initialize-all() to reconnect to DBs during waits to better detect when a DB is marked as offline.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon to check for files in /mnt/shared/incoming on striker dashboards and add them to the media library if needed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated striker-prep-database to always set the user's password, independent of whether the database user was created.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Network->is_our_interface() which returns '1' if an interface is one managed by an Anvil!. Also updated scan-network to use this to determine when an interface alert should be a warning or notice level alert.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-filesystems to set swap usage alerts to notice level only.
* Updated scan-network to pull the permanent MAC address from an 'ethtool -P <iface>' call to deal with the fact that wireless interfaces don't have their real MAC in the sysfs address file.
* Updated anvil-provision-server to set the rtc_tickpolicy to catchup.
Signed-off-by: Digimer <digimer@alteeve.ca>
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-manage-dr to use the TCP ports already configured for a resource when re-configuring a DR resource that has been previously configured.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).
Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
* More bugs fixed in anvil-manage-dr, tested repeatedly as a job and so far, so good. Other functionality still to come.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Got anvil-manage-dr to the point where it writes the updated resource configuration to enable DR support. (untexted)
Signed-off-by: Digimer <digimer@alteeve.ca>
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where anvil-safe-stop was not recording the stop-reason. Also made '--poweroff' an alias for '--power-off'.
Signed-off-by: Digimer <digimer@alteeve.ca>
Created Storage->get_storage_group_from_path() that takes a block device path and tried to find the Storage Group it belongs to.
Updated Storage->get_storage_group_data() to make it possible to look up a storage group UUID using the SG's name.
Updated DRBD->gather_data() to take a pre-generated XML via the new 'xml' parameter.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->parse_definition() to call DRBD->get_devices() so that referenced LVs can be loaded properly.
* Continued WIP in anvil-manage-server
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Cluster->parse_cib() where the local machine's ready state was being set to the node name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-delete-server where, if a server was off already, the server would not be removed from pacemaker.
* WIP - continuing on scan-network
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Jobs->get_job_uuid() to accept the new 'incomplete' parameter that, when set, will look for jobs whose progress is > 1 and < 100.
* Updated ScanCore-agent_startup() to take the new 'no_db_ok' parameter which returns with '0' if no DB is available and that parameter is set to '1'.
* Fixed a logging bug in 'anvil-join-anvil'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-cluster to get the CIB from pcs instead of reading the CIB from disk.
* Updated anvil-daemon to always call striker-prep-database at log level 2 while trying to find the cause of rare postgres config failures. Also updated striker-prep-database to use the new method of initializing the DB.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-configure-host to not reboot on network chane (will verify when this commit is function tested).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added the 'print' parameter to Log->variables() to allow printing to STDOUT when set.
* Renamed Network->check_bonds() to Network->check_networks() in anticipation of adding bridge monitoring / repair to it later.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Upped the aging of jobs and alerts data from 2 to 24 hours. Also added a check to prevent deleting a job of any age that is incomplete.
* Major update to anvil-configure-host to not touch the network unless something has actually changed. Not yet tested on a fresh system, will verify nothing broke in the CI tests this commit will trigger. Also changed it so that, if after reconfiguring the network it times out trying to reconnect to a database, it calls a reboot instead of simply exiting. Further, a reboot is now not called on exit unless something changed to require it.
* Updated Network->check_bonds() to return '1' if anything was done to heal a bond.
* Updated anvil-update-states to be more careful about clearing virsh bridges. Specifically, it checks to see if virsh is running and that the returned bridges aren't actually error codes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon to call Network->check_bonds() with 'all' on startup, then woth 'down_only' once per minute to try to heal down'ed bonds.
* Updated anvil-watch-bonds to take a 'run-once' switch and exit after one report, if set.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_table() and ->_find_behind_databases() to loop through connected databases, instead of configured databases.
* Updated Network->get_ips() to only record the real MAC addresses on network interfaces (not bonds or bridges) in the "network::${host}::interface::${in_iface}::mac_address" hash. This should help avoid reboot loops caused by anvil-configure-host thinking the network needs to be reconfigured when it doesn't actually need to be.
Signed-off-by: Digimer <digimer@alteeve.ca>
* WIP - Continuing work on the new anvil-manage-server tool.
* Updated Database->get_anvils() to load information on the files available on each Anvil! system.
* Updated Database->insert_or_update_network_interfaces() to no longer take the 'timestamp' parameter.
* Removed all logging from Database->refresh_timestamp() to speed it up, given how often it will be called now.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon to have a new function called "handle_special_cases" called during startup that does any weird bug mitigation required. For now, this is used to mitigate against rhbz#1961562, though certainly it will be used for other reasons later.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug with anvil-manage-keys where a state_uuid entry recorded on one database may not be read from a machine reading from another database.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->parse_cib() to not require a database connection (part of the work to make ocf:alteeve:server run without a DB)
* WIP: Continuing work on the ocf:alteeve:server RA to run without database connections.
* Updated the scancore daemon to explcitely check that all scan agent schemas are loaded in all databases on startup. This is to resolve resync issues on rebuilt strikers that may not yet have some schemas loaded when a DB resync runs.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the scancore interval to 60 seconds.
* Updated Database->insert_or_update_health() so that 'delete' can find the health_uuid.
* Updated Convert->time() to return silently when passed '-1'.
* Fixed a bug scan-hardware to call Convert->round(). Also fixed it so it didn't set health scores of 0 for mismatch RAM when the RAM was not mismatched.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Words->center_text that takes a string (or string key) and centers it to a given string length, padding white spaces on either side of the string as needed.
Signed-off-by: Digimer <digimer@alteeve.ca>