* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.
Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.
Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Get->anvil_from_switch and Get->server_from_switch() (both need testing) that takes a string that could be either a name or UUID, figures out which it is, finds the entry in the DB and started the X_uuid and X_name switch variables.
* Started work on a second attempt at anvil-manage-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated striker-initialize-host to support calls from command line switches, and wrote the man page for it.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->check_proxy_license() to do (some level of) sanity checks on the DRBD proxy license file.
* Updated DRBD->gather_data() to parse out the inside and outside ports for resource configs using proxy.
* Reworked DRBD->get_next_resource() to return 1, 3 or 7 TCP ports depending, with the new long_throw_ports parameter triggering the 7 ports.
* Added 'tcpdump' to the anvil-core requires list.
* Reworked scan-drbd to record the ports used in proxy configs. This required adding a check to change the 'scan_drbd_peer_tcp_port' column type to 'text' to support CSVs.
* Reworked anvil-manage-dr (needs testing!) to support "long-throw" DR configs.
* Updated anvil-safe-stop to check if the nodes are in the cluster before trying to migrate.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-safe-stop to check for VMs running, even if the cluster is stopped, when --stop-servers is used.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in System->parse_arguments() where a quoted password without spaces was returned without being recorded in the hash. Also updated logging to log 'suppressed' for passwords when secure logging is disabled.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created the anvil-manage-dr man page.
* Reworked anvil-manage-dr's --protect logic to search for which network works with the DR host, instead of assuming it's the SN.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->parse_definition() to check if a failed 'virsh list' output was passed in. Also changed it to not exit if the XML can't be parsed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge.
* Added 'test' messages to Words->string().
* Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added missing always-available switchs in Get->switches
* Create Storage->_wait_if_changing() to check to see if a file's size is changing and, if so, not return until it stops.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in scancore-agents/Makefile.am where scan-network was missing.
* Started work on anvil-delete-server.8. Incomplete at this time.
* Updated Network->get_ips() to record the interface status.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Disabled striker-parse-oui until it can be reworked to store the the OUI data in a flat file instead of in the database.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-boot-server when called with '--all' to honour boot ordering, delays and condtions.
* Updated Database->get_servers() to collect the server's XML as well as data from the 'servers' table.
* Updated anvil-provision-server to make a new DRBD resource 'secondary' after forcing it to primary to begin the initial sync.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Continued work on fixing issues with striker-purge-target (which led the the discovery of the above bug). Added expliit checks to purge file_location and storage_group data when purging an sub-anvil from the database.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added an 'eval' wrapper around 'Database->write()' where it calls the given DB so that failures log properly instead of crash the program.
* Updated Database->_find_column() to no longer restrict to 'not null' calumn types.
* Fixed a couple typos in Database->read_state().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Changes Database->connect to always use the first DB connected to, not the local one if that applies. This treats the first DB (sorted by UUID) as "primary" and the second (or third...) as more of a backup.
* Moved db_in_use and lock_request to use the 'states' table instead of the variables table. These are set and removed so often that it was messing up things with resync's when the data is transient anyway. Fixed multiple bugs with both to better set and clear properly.
* Created Database->read_state() to assist with the above changes.
* Updated Database->refresh_timestamp() to specifically check that the returned time stamp differs from the previously used one, looping until they differ if needed.
* Disabled striker-manage-install-target when called to update the repos, as the Install Target function doesn't work at this point.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->get_anvils() to make it possible to translate a file name to a file UUID.
* Updated System->test_ipmi() to quote passwords properly. Also dropped the timeouts to 2 seconds.
* Updated anvil-provision-server to support pure CLI switch server provisioning using the --ci-test (and optional --options {--machine}) to allow CI tests.
* Continued work of anvil-manage-server.
* Fixed a bug in striker-prep-database to fix a bug in writing the pg_hba.conf file.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated System->_check_anvil_conf() to create the 'admin' user in a more normal way (old way caused the 'admin' group to be a system GID.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated ocf:alteeve:server to always try to bring up the peer's DRBD resource, even when the local resource is up.
* Fixed a bug in scan-network where purging duplicate bridges failed in some cases.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated DRBD->get_devices() to store information about the nodes for each resource.
* Got more work done on anvil-report-usage.
Signed-off-by: Digimer <digimer@alteeve.ca>
Created, but not finished, tools/anvil-report-usage that will print a report of server resource allocation and Anvil! resource availability.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan_network to properly mark virtio network interfaces as being full duplex. Also updated it to purge interfaces flagged as 'DELETED'.
* Updated the VM OS list.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the logic of when to boot a node or DR host that was found to be off for unknown reasons to require both poewr and temperature to be OK, and checks against the new 'feature::scancore::disable::boot-unknown-stop' config variable.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated striker-prep-database to always set the user's password, independent of whether the database user was created.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Network->is_our_interface() which returns '1' if an interface is one managed by an Anvil!. Also updated scan-network to use this to determine when an interface alert should be a warning or notice level alert.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in scan-hardware where the raw bytes free for swap was used to see if the high / cleared thresholds were passed, instead of the percentage as it should have been.
* Fixed a bug in scan-network where a new-line wasn't be cleared off the MAC address.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-filesystems to set swap usage alerts to notice level only.
* Updated scan-network to pull the permanent MAC address from an 'ethtool -P <iface>' call to deal with the fact that wireless interfaces don't have their real MAC in the sysfs address file.
* Updated anvil-provision-server to set the rtc_tickpolicy to catchup.
Signed-off-by: Digimer <digimer@alteeve.ca>
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-manage-dr to use the TCP ports already configured for a resource when re-configuring a DR resource that has been previously configured.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).
Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
* More bugs fixed in anvil-manage-dr, tested repeatedly as a job and so far, so good. Other functionality still to come.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Got anvil-manage-dr to the point where it writes the updated resource configuration to enable DR support. (untexted)
Signed-off-by: Digimer <digimer@alteeve.ca>
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created ScanCore->agent_shutdown() that writes out the time the scan agent last ran, and how many databases were available when it last ran.
* Updated ScanCore->agent_startup() to read the the last run data created above.
* Updated Database->connect() to set 'sys::database::last_db_count' to the scan agent's recorded last DB count.
* Updated all agents to call ScanCore->agent_shutdown() at the end of their run.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where anvil-safe-stop was not recording the stop-reason. Also made '--poweroff' an alias for '--power-off'.
Signed-off-by: Digimer <digimer@alteeve.ca>
Created Storage->get_storage_group_from_path() that takes a block device path and tried to find the Storage Group it belongs to.
Updated Storage->get_storage_group_data() to make it possible to look up a storage group UUID using the SG's name.
Updated DRBD->gather_data() to take a pre-generated XML via the new 'xml' parameter.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->parse_definition() to call DRBD->get_devices() so that referenced LVs can be loaded properly.
* Continued WIP in anvil-manage-server
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Cluster->parse_cib() where the local machine's ready state was being set to the node name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert-register so that, if 'sort_position' is not set (or set to 9999), an internal counter for each alert level is created and used so that alert entries sort naturally by the order they're registered.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-delete-server where, if a server was off already, the server would not be removed from pacemaker.
* WIP - continuing on scan-network
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Jobs->get_job_uuid() to accept the new 'incomplete' parameter that, when set, will look for jobs whose progress is > 1 and < 100.
* Updated ScanCore-agent_startup() to take the new 'no_db_ok' parameter which returns with '0' if no DB is available and that parameter is set to '1'.
* Fixed a logging bug in 'anvil-join-anvil'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated DRBD->allow_two_primaries() to take the 'set_to' parameter which can be 'yes' to all and 'no' to disallow dual-primary.
* Updated ocf:alteeve:server to call allow_two_primaries() with 'set_to' = 'no' instead of calling 'adjust' after a migration completes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-cluster to get the CIB from pcs instead of reading the CIB from disk.
* Updated anvil-daemon to always call striker-prep-database at log level 2 while trying to find the cause of rare postgres config failures. Also updated striker-prep-database to use the new method of initializing the DB.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated scan-cluster to check / set which node should be preferred if a netsplit causes a fence race.
* Fixed a bug in Server->shutdown_virsh() where a shutdown timeout would go into a loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-configure-host to not reboot on network chane (will verify when this commit is function tested).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added the 'print' parameter to Log->variables() to allow printing to STDOUT when set.
* Renamed Network->check_bonds() to Network->check_networks() in anticipation of adding bridge monitoring / repair to it later.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Upped the aging of jobs and alerts data from 2 to 24 hours. Also added a check to prevent deleting a job of any age that is incomplete.
* Major update to anvil-configure-host to not touch the network unless something has actually changed. Not yet tested on a fresh system, will verify nothing broke in the CI tests this commit will trigger. Also changed it so that, if after reconfiguring the network it times out trying to reconnect to a database, it calls a reboot instead of simply exiting. Further, a reboot is now not called on exit unless something changed to require it.
* Updated Network->check_bonds() to return '1' if anything was done to heal a bond.
* Updated anvil-update-states to be more careful about clearing virsh bridges. Specifically, it checks to see if virsh is running and that the returned bridges aren't actually error codes.
Signed-off-by: Digimer <digimer@alteeve.ca>