* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.
To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in ScanCore->agent_startup() where a (thankfully broken) check to append tables to the 'sys::database::check_tables' would cause an infinite loop as both were pointers to the same anonymous array.
* Fixed a bug in scan-ipmitool where the scan_ipmitool_variables table didn't use a host_uuid reference, causing resyncs of that table to sync for all hosts and cause DB errors when the scan_ipmitool record from another host wasn't sync'ed yet.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Database->_find_behind_databases() _find_behind_databases() where the logic to figure out which column was the host_uuid reference was too liberal, causing the wrong column to be selected in some cases. Also added a check to not look for host_uuid columns on specific tables where a match would be made, but the column is allowed to be null (like server_host_uuid that indicates the host of the server).
* Started work on the scan-filesystems scan agent, which is needed to record the data that the file system UI endpoints will need.
* Removed the 'not null' constraint from 'servers' -> 'server_host_uuid'.
* Fixed a bug in 'anvil-provision-server' where the driver ISO being 'none' caused the provision script to use the driver ISO switch.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->insert_or_update_storage_group_members() to use the host_uuid when trying to find existing members.
* Added the skeleton of a bunch of new json endpoints for the new UI features.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Remote->test_access() to not used cached SSH access.
* Updated anvil-configure-host to abort if the host is in a cluster.
* Updated anvil-join-anvil to clean up some variable checks to help avoid unitialized variable messages.
* Updated striker-initialize-host to check if an anvil RPM is installed and, if so, not install the Anvil! repo.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added a check to striker-initialize-host the see if anvil-X RPM is already installed. If so, it will not install the Alteeve repo, even if it's not found.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Remote->call() to return ('!!error!!', '!!error!!', 9999) when an error hits. Made Remote->test_access() explicitely check for '1' to be returned in order to confirm access, fixing a bug where bad target value caused false positives. Updated ->_check_known_hosts_for_target() to no longer explicitely check for 'ssh-rsa' so that machine keys using different cyphers are detected as being in known_hosts properly.
* Updated striker-auto-initialize-all to initialize nodes and DR hosts networks before trying to form them into an Anvil!. Fixed several other bugs as well. More testing is needed, but it works now.
* Updated striker-initialize-host to check for the alteeve repo and, it not found, check for accress to alteeve.com. If access, it will install our repo now.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Remote->call() where the output of the call not ending in a newline wasn't having the return code parsed off properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Made the error reported by Remote->call() more verbose when called without 'target' being set.
* Updated anvil-daemon to not call jobs more that once per minute.
* Started work on striker-auto-initialize-all, still very far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Get->os_type() to use 'cat' instead of Storage->read_file() because 'rsync' may not be available when it is called during striker-initialize-host calls.
* Updated Database methods to skip 'oui' and 'state' during resync.
* Updatedb striker-initialize-host to detect when it's initializing a CentOS Stream Node / DR Host and enable the HA repo.
* Created the tools/striker-auto-initialize-all tool, which is very much incomplete, that will allow for the rapid creation of a full Anvil! from freshly installed machines autonomously.
Signed-off-by: Digimer <digimer@alteeve.ca>
* This commit includes two unrelated test files for UI work, cgi-bin/get_anvil_status and cgi-bin/get_anvils.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated pxe.txt to start support for UEFI boot target.
* Updated update_install_source to be smarter about moving html and tftp directory names to better reflect the host OS, and to make it support converting OSes on the fly. Also added support to the package list for CentOS Stream.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Cleaned up a log of logging to reduce the amount of log entries when running at log level 1.
* Bumped the scan-ipmitool default 'jump' range to 10c.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
Note: These changes below shouldn't have been in this branch... *sigh*
* Fixed an issue with tools/anvil-provision-server where a VM would be created but didn't boot. When this happens, an explicit boot is sent via virsh. Also bumped up the time it waits for a new server to start up.
* Added an explicit call to scan-drbd after a new resource is created to ensure that if any calls come after looking for the next free DRBD minor or port, they don't use the ones just used.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-provision-server where forcing initialization of a new DRBD resource when running on node 2 would fail because the node ID in the drbdsetup command was hard-coded to be run from node 1.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>