* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the logic of when to boot a node or DR host that was found to be off for unknown reasons to require both poewr and temperature to be OK, and checks against the new 'feature::scancore::disable::boot-unknown-stop' config variable.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated striker-prep-database to always set the user's password, independent of whether the database user was created.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-manage-dr to use the TCP ports already configured for a resource when re-configuring a DR resource that has been previously configured.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).
Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created ScanCore->agent_shutdown() that writes out the time the scan agent last ran, and how many databases were available when it last ran.
* Updated ScanCore->agent_startup() to read the the last run data created above.
* Updated Database->connect() to set 'sys::database::last_db_count' to the scan agent's recorded last DB count.
* Updated all agents to call ScanCore->agent_shutdown() at the end of their run.
Signed-off-by: Digimer <digimer@alteeve.ca>
Created Storage->get_storage_group_from_path() that takes a block device path and tried to find the Storage Group it belongs to.
Updated Storage->get_storage_group_data() to make it possible to look up a storage group UUID using the SG's name.
Updated DRBD->gather_data() to take a pre-generated XML via the new 'xml' parameter.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->parse_definition() to call DRBD->get_devices() so that referenced LVs can be loaded properly.
* Continued WIP in anvil-manage-server
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert-register so that, if 'sort_position' is not set (or set to 9999), an internal counter for each alert level is created and used so that alert entries sort naturally by the order they're registered.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Upped the aging of jobs and alerts data from 2 to 24 hours. Also added a check to prevent deleting a job of any age that is incomplete.
* Major update to anvil-configure-host to not touch the network unless something has actually changed. Not yet tested on a fresh system, will verify nothing broke in the CI tests this commit will trigger. Also changed it so that, if after reconfiguring the network it times out trying to reconnect to a database, it calls a reboot instead of simply exiting. Further, a reboot is now not called on exit unless something changed to require it.
* Updated Network->check_bonds() to return '1' if anything was done to heal a bond.
* Updated anvil-update-states to be more careful about clearing virsh bridges. Specifically, it checks to see if virsh is running and that the returned bridges aren't actually error codes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_table() and ->_find_behind_databases() to loop through connected databases, instead of configured databases.
* Updated Network->get_ips() to only record the real MAC addresses on network interfaces (not bonds or bridges) in the "network::${host}::interface::${in_iface}::mac_address" hash. This should help avoid reboot loops caused by anvil-configure-host thinking the network needs to be reconfigured when it doesn't actually need to be.
Signed-off-by: Digimer <digimer@alteeve.ca>
* WIP - Continuing work on the new anvil-manage-server tool.
* Updated Database->get_anvils() to load information on the files available on each Anvil! system.
* Updated Database->insert_or_update_network_interfaces() to no longer take the 'timestamp' parameter.
* Removed all logging from Database->refresh_timestamp() to speed it up, given how often it will be called now.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->parse_cib() to not require a database connection (part of the work to make ocf:alteeve:server run without a DB)
* WIP: Continuing work on the ocf:alteeve:server RA to run without database connections.
* Updated the scancore daemon to explcitely check that all scan agent schemas are loaded in all databases on startup. This is to resolve resync issues on rebuilt strikers that may not yet have some schemas loaded when a DB resync runs.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the scancore interval to 60 seconds.
* Updated Database->insert_or_update_health() so that 'delete' can find the health_uuid.
* Updated Convert->time() to return silently when passed '-1'.
* Fixed a bug scan-hardware to call Convert->round(). Also fixed it so it didn't set health scores of 0 for mismatch RAM when the RAM was not mismatched.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Removed the checks for resync that limited resyncs on jobs and variables tables. That approach to minimize unnecessary resyncshas proven faulty, will find another way later.
Signed-off-by: Digimer <digimer@alteeve.ca>
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in that archiving defaulting to not store on disk was not working properly. Now acts as described in anvil.conf.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.
Signed-off-by: Digimer <digimer@alteeve.ca>