Commit Graph

612 Commits

Author SHA1 Message Date
digimer
c50a1936c0 * This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
digimer
895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
89eae7098e NOTE: This updates the reserved RAM to 8 GiB from 4 GiB!
* Adds support for 'anvil_resources:🐏:reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏:reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏:reserved' option.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-17 20:43:28 -04:00
digimer
f9689a7106 Updated ocf:alteeve:server to look for /tmp/<resource>.fail' and, if that file exists, exits with rc:1. This is done to allow for testing.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-10 17:40:46 -04:00
digimer
cf73d8ed36 * Updated System->configure_ipmi() to auto-configure DR hosts once they've been assigned a BCN IP address.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-05 15:04:39 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
Fabio M. Di Nitto
a6f2c2271e Fix typo in log message
Resolves: https://github.com/ClusterLabs/anvil/issues/294

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-03-22 07:38:10 +01:00
digimer
b144976853 This resolves Issue #310.
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-20 23:43:40 -04:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
digimer
7773e5f9b8 * Updated logging in DRBD->get_devices().
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-30 11:30:36 -05:00
digimer
e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups.
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:10:15 -05:00
digimer
f8743a7435 * Further work on anvil-manage-dr. Now properly sanity checks that a valid server is passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-19 22:14:17 -05:00
digimer
1a217d21cf * Updated anvil-manage-dr to provide the ability to link anvil nodes to dr hosts. Also began work on making it work with the new DR links system.
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-19 19:41:02 -05:00
digimer
17863404e3 * Updated Database->_age_out_data() to only run once per day, unless explicitely called with --age-out-database.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 20:33:28 -05:00
digimer
ff69916a85 * Applied typo fixed from PR #286 (thanks, Deezzir!). Also moved all the raw prints into words.xml.
* Updated Convert->human_readable_to_bytes() to return an empty string if passed an empty string.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-16 20:23:29 -05:00
digimer
9d2f9c4d88 * Fixed a string key name typo.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 19:53:57 -05:00
digimer
b8b4352117 * Added support for Migration Network configs in old striker and anvil-configure-host
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:24:26 -05:00
digimer
b27a43eaf7 * Updated striker to only require 6 interfaces when configuring a node.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-14 16:59:41 -05:00
digimer
0fa6ddebc5 Updated scan-network to see an interface state of 'activated' as up (used to check specifically for 'active').
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-14 16:22:51 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
digimer
b666caec64 * Updated anvil-provision-server to handle startup when the peer doesn't create/connect it's DRBD resource (ie: node is offline).
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:00:38 -05:00
digimer
a5cee52153 * Fixed a bug in DRBD->get_devices() where old test host UUIDs were left hard-coded.
* Fixed a duplicate header in words.xml
* Fixed display bugs in anvil-report-usage and removed the old DR host display info.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 22:58:28 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
02e371ac56 Updated virsh OS list.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-06 18:08:52 -05:00
Digimer
f6cbe7d1d2 * Fixed a bug in System->collect_ipmi_data() where double-quoted passwords were preventing reading of the sensor data.
* Added a new table to the main SQL schema to allow for more dynamic tracking of which Anvil! node pairs can use which DR hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-06 15:07:05 -05:00
Digimer
4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence.
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.

Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.

Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-29 22:17:12 -05:00
Digimer
6eb99a2168 * FInished the anvil-manage-alerts tool. It can now send test alerts at a user-requested alert level.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 01:10:53 -05:00
Digimer
8b7a44cf75 * Finished cleaning up the output of Machines.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 00:19:00 -05:00
Digimer
3e53c87a6b Formatted the output of anvil-manage-alerts data (not yet machines) to be more presentable.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 23:28:50 -05:00
Digimer
622fb84652 * Renamed the 'notifications' table to 'alert-override', better reflecting what it does.
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 00:34:52 -05:00
Digimer
586ce6e5b9 * Got recipints working in anvil-manage-alerts().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-15 22:17:12 -05:00
Digimer
35cf0c37fb * Updated System->check_ram_use() to set the maximum RAM based on the host type, and set those values in _set_default() so that the user can override if they want.
* Got anvil-manage-alerts to the point where you can add, edit and delete mail servers.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-14 17:17:30 -05:00
Digimer
a6cd5c6604 * Starting work in the new anvil-manage-alerts, which will (when done), allow for management of mail servers, alert recipients, notification over-rides and to trigger test alerts.
* Updated Database->get_recipients() to record recipients by name for better sorting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-28 20:00:53 -04:00
Digimer
bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster.
* Created Get->anvil_from_switch and Get->server_from_switch() (both need testing) that takes a string that could be either a name or UUID, figures out which it is, finds the entry in the DB and started the X_uuid and X_name switch variables.
* Started work on a second attempt at anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-20 22:33:41 -04:00
Digimer
93427a7a38 * Updated Get->switches() to always support job-uuid.
* Updated striker-initialize-host to support calls from command line switches, and wrote the man page for it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 19:16:32 -04:00
Digimer
c23c79cdf0 Added 'system::all::configured' to anvil-join-anvil to mark an explicit end of config.
Started updating striker-initialize-host to handle the new anvil repo config.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 10:56:58 -04:00
Digimer
596855405f * Added variables to record when pacemaker and DRBD are configured.
* Added verify-alg to DRBD configs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-17 21:57:00 -04:00
Digimer
3b721b849c * Fixed a bug in anvil-configure-host where if the same MAC address was assigned to two interfaces, it would cause an endless reboot loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-28 19:20:23 -04:00
Digimer
599373816f * Fixed bugs that came up in testing. Was now able to setup long-throw DR!
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-22 16:40:40 -04:00
Digimer
2fab7bc1b7 This adds support (testing needed) for "Long-Throw" DR; which is a wrapper for using 'drbd-proxy' to provide larger transmit buffers so slow/high-latency DR hosts.
* Created DRBD->check_proxy_license() to do (some level of) sanity checks on the DRBD proxy license file.
* Updated DRBD->gather_data() to parse out the inside and outside ports for resource configs using proxy.
* Reworked DRBD->get_next_resource() to return 1, 3 or 7 TCP ports depending, with the new long_throw_ports parameter triggering the 7 ports.
* Added 'tcpdump' to the anvil-core requires list.
* Reworked scan-drbd to record the ports used in proxy configs. This required adding a check to change the 'scan_drbd_peer_tcp_port' column type to 'text' to support CSVs.
* Reworked anvil-manage-dr (needs testing!) to support "long-throw" DR configs.
* Updated anvil-safe-stop to check if the nodes are in the cluster before trying to migrate.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-21 23:35:06 -04:00
Digimer
c8ee75420d * Updated anvil-manage-dr to check if a server is protected before processing a --connect or --disconnect request. Also made it smarter if an attempt to connect a resource fails.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-01 16:09:37 -04:00
Digimer
e90dae96f7 * In Server->shutdown_virsh(), disabled trying to resume a paused VM. Also updated the logging around not waiting for a VM to stop.
* Updated anvil-safe-stop to check for VMs running, even if the cluster is stopped, when --stop-servers is used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-31 18:12:07 -04:00
Digimer
d271ffec26 * Updated Cluster->parse_crm_mon() to record the role of stonith resources.
* Fixed a bug in System->parse_arguments() where a quoted password without spaces was returned without being recorded in the hash. Also updated logging to log 'suppressed' for passwords when secure logging is disabled.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-31 12:57:01 -04:00
Digimer
89121a2b3b * Fixed a bug in Alert->check_condition_age() where not setting a host_uuid caused the returned age to always be 0.
* Updated scan_apc_pdu to not report a lost PDU unless it's been gone for ten minutes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-29 17:30:52 -04:00
Digimer
99a6593fe6 * Fixed a bug when connecting to databases when one DB has no variable entries, making it seem like a DB was disabled.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-25 21:43:21 -04:00
Digimer
b8bb7cc423 * Changed the default trigger of live migrations to require a health score difference of 2 or higher. This can be user-adjusted using the new 'feature::scancore::threshold::preventative-live-migration' anvil.conf option.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-25 12:43:51 -04:00
Digimer
9675ebf986 * Added --remove support to anvil-manage-dr, completing all the features for this tool.
* Updated DRBD.pm to move the logic to wipe and delete an LV into a new method called 'remove_backing_lv'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-24 22:08:48 -04:00
Digimer
93e6a59841 * Added 'vnc-server' to the list of firewall services enabled on strikers.
* Created the anvil-manage-dr man page.
* Reworked anvil-manage-dr's --protect logic to search for which network works with the DR host, instead of assuming it's the SN.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-22 13:38:46 -04:00
Digimer
29a28ee97a * Fixed a bug with anvil-provision-server where running the command line menu from a Striker would not assign the job to the target Anvil!.
* Updated Server->parse_definition() to check if a failed 'virsh list' output was passed in. Also changed it to not exit if the XML can't be parsed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-16 19:01:36 -04:00
Digimer
cbb441759e * Fixed a couple bugs in anvil-manage-files where a file moved from incoming to files or definitions wasn't having the directory updated properly in the database. Also made an explicit check when looking for missing files to check to see if the file exists in another managed directory and, if so and if a striker, update the DB.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-15 23:27:40 -04:00
Digimer
a81478f2bc * Updated 'db_in_use' state to add the caller's name to the state name. This is pulled out when logging stale locks that are being reaped, to help debug where stale locks are coming from.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:29:03 -04:00
Digimer
e7cf8ac789 * Got more work done on anvil-manage-files. It now picks up new files on nodes/dr hosts in an Anvil! and downloads them if needed.
* Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:08:19 -04:00
Digimer
be84a23924 * There were still references in anvil-manage-files to 'file_locations' -> 'file_location_host_uuid'. Had to rework some logic to get things working. More testing needed, but so far at least the "missing file" function is working again.
* Added missing always-available switchs in Get->switches
* Create Storage->_wait_if_changing() to check to see if a file's size is changing and, if so, not return until it stops.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-08 21:31:56 -04:00
Digimer
15aadc3a4e * Updated scan-network to check for inactive or activating interfaces and manually bring them up, if the uptime is less than 10 minutes.
* Fixed a bug in scancore-agents/Makefile.am where scan-network was missing.
* Started work on anvil-delete-server.8. Incomplete at this time.
* Updated Network->get_ips() to record the interface status.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-03 23:38:56 -04:00
Digimer
171ea74000 * There is a fix in this commit to resolve a race condition where, when reconfiguring the network, the request to set a job to reboot would fail because the connections to all Strikers could be lost, causing Database->_test_access() would error out, blocking the reboot. When restarted, the network would not be changed, so no reboot would be requested, leaving the machine in an innaccesible state.
* Updated anvil-boot-server when called with '--all' to honour boot ordering, delays and condtions.
* Updated Database->get_servers() to collect the server's XML as well as data from the 'servers' table.
* Updated anvil-provision-server to make a new DRBD resource 'secondary' after forcing it to primary to begin the initial sync.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-06 19:22:28 -04:00
Digimer
bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work.
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-02 17:06:04 -04:00
Digimer
b2ea4f9adc * Moved System->manage_firewall() to Network->manage_firewall(). Started working on actually implementing it, which involves basically fully rewritting it.
* Updated tools/Makefile.am and scancore-agents/Makefile.am to add missing files.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 00:01:50 -04:00
Digimer
f2d06fa9b1 * Updated striker-parse-oui to only run if/when the system has been running for at least one hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-23 21:21:38 -04:00
Digimer
ab9b00a2f7 * Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons.
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-21 22:25:07 -04:00
Digimer
b77bb81343 * Found a bug where, if a record was deleted from the public schema but not from the history schema, and then later a resync was performed, the record would be added to the peer database's public schema (while still not existing locally). This condition should never occur as data in history should only exist to track the public record. This update checks for this condition and purges those records prior to resync'ng a database table.
* Continued work on fixing issues with striker-purge-target (which led the the discovery of the above bug). Added expliit checks to purge file_location and storage_group data when purging an sub-anvil from the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-20 20:52:07 -04:00
Digimer
3caf43ed42 Updated striker-purge-target to check for problems on write of DELETEs.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-18 10:57:50 -04:00
Digimer
911f7cfb6a This is another big commit with a lot of DB work. Getting closer to sorting out the frequent resyncs.
* Changes Database->connect to always use the first DB connected to, not the local one if that applies. This treats the first DB (sorted by UUID) as "primary" and the second (or third...) as more of a backup.
* Moved db_in_use and lock_request to use the 'states' table instead of the variables table. These are set and removed so often that it was messing up things with resync's when the data is transient anyway. Fixed multiple bugs with both to better set and clear properly.
* Created Database->read_state() to assist with the above changes.
* Updated Database->refresh_timestamp() to specifically check that the returned time stamp differs from the previously used one, looping until they differ if needed.
* Disabled striker-manage-install-target when called to update the repos, as the Install Target function doesn't work at this point.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-16 20:10:43 -04:00
Digimer
24f5d39dff This is a set of changes all stemming from trying to debug frequent resyncs. More bugs still to be fixed.
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-12 23:14:49 -04:00
Digimer
1b70b49cf8 * Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in.
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-20 10:28:21 -04:00
Digimer
7ec4cee143 Created the new anvil-show-local-ips that shows the IPs on the host in an easier to read format, compared to 'ip addr list'.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-26 20:24:15 -04:00
Digimer
e9a9e0dd4b * Finished (but needs more testing) the new 'anvil-report-usage' tool.
* Updated System->_check_anvil_conf() to create the 'admin' user in a more normal way (old way caused the 'admin' group to be a system GID.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-25 23:56:58 -04:00
Digimer
d2973e603b Updated anvil-update-states to make the speed of links to 10000 when they are virtio interfaces.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-13 10:55:35 -04:00
Digimer
4751c6e747 Updated DRBD->get_devices() and Server->parse_definition() to take 'anvil_uuid' so that server data can be parsed from anywhere.
Created, but not finished, tools/anvil-report-usage that will print a report of server resource allocation and Anvil! resource availability.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-06 00:16:32 -04:00
Digimer
c9633aa3b0 * Updated Database->_find_behind_databases() to not run unless it's on a Striker.
* Updated scan_network to properly mark virtio network interfaces as being full duplex. Also updated it to purge interfaces flagged as 'DELETED'.
* Updated the VM OS list.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-04 16:38:14 -04:00
Digimer
aa7d9bdf14 * Fixed a bug where resync'ing the database was missing tables.
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
74b7719cf5 * Created the new anvil-manage-host that can check/set if a host is configured. On Strikers, it can age out data, resync data, and check/set if the local database is active.
* Updated striker-prep-database to again enable the postgresql service.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
422d248cbe * Updated Database->insert_or_update_states() to not actually record unless the state_host_uuid exists in all available databases.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-15 23:42:31 -04:00
Digimer
7b090e1623 * Updated Database->shutdown() to disconnect, stop the postgresql daemon, then reconnect.
* Updated anvil-daemon to not stop a database until both/all DB hosts are in both/all DB's hosts table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-15 22:33:42 -04:00
Digimer
3fd0db15bf * This rather heavily reworks how database shutdowns works. It adds much more intelligent shutdown, tracking who is using the database, being able to mark a database as "offline" and waiting for users of the database to disconnect before it shuts down.
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
b234b79544 Updated anvil-daemon to check if anvil-sync-shared is running if the reported RAM use is too high. If so, it doesn't exit. This fixes an issue where anvil-sync-shared would loop forever as it would constantly be killed when downloading large files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-27 21:29:30 -05:00
Digimer
7023ffb56b Further improved startup DRBD logic in ocf:alteeve:server. Specifically, it will startup if a local resource/volume is sync'ing.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-23 12:41:49 -05:00
Digimer
bc39c3fe5c Updated ocf:alteeve:server to better handle multi-peer DRBD configurations.
Cleaned up some logging in DRBD->get_status.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-21 19:18:28 -05:00
Digimer
e62e5d7b0c * Updated ocf:alteeve:server to better handle starting up DRBD resources before trying to boot a VM.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-21 19:18:28 -05:00
Digimer
87a2454a09 Moved anvil-configure-host reboot logging to use log_0687 to help grep for reboot causes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-21 19:18:28 -05:00
Digimer
dc989f0950 Added more logging to track when and how reboots happen in systems.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-16 21:55:33 -05:00
Digimer
d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits.
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-05 22:08:06 -05:00
Digimer
a886653af1 * Updated scan-network to purge duplicate bridges and bonds.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-01 20:15:01 -05:00
Digimer
a633ab7f63 Added a periodic check to ensure all users can ping. This fixes a bug where a local striker dashboard whose DB was stopped wouldn't work.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-21 14:27:58 -05:00
Digimer
e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-20 14:13:27 -05:00
Digimer
892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same.
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-18 02:38:50 -05:00
Digimer
0a9f81d852 * Created the new striker-db-report that shows how many records are in each database table, and if passed a table name, report how many times each record has been recorded in the history schema.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-13 20:00:37 -05:00
Digimer
72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces.
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-03 17:06:56 -05:00
Digimer
3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system.
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-07 20:03:39 -05:00
Digimer
65dfc22a38 Added an eval{} call around Database->query()'s ->prepare() DBI call to better handle lost database handle.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-30 17:58:29 -05:00
Digimer
9eec6c4977 * Created ScanCore->check_temperature_direct() based around that start logic from ScanCore->post_scan_analysis_striker() temperature check, and updated the later to use the former.
* Updated the logic of when to boot a node or DR host that was found to be off for unknown reasons to require both poewr and temperature to be OK, and checks against the new 'feature::scancore::disable::boot-unknown-stop' config variable.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-29 22:43:23 -05:00
Digimer
958267e38f * Enabled scancore in the .spec file. Disabled calling striker-prep-database and anvil-update-state in the same.
* Updated striker-prep-database to check / wait until postgresql-server is installed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 01:22:33 -05:00
Digimer
6225ce1943 Updated striker-prep-database to not configure the firewall if firewalld isn't running.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 23:32:07 -05:00
Digimer
b517117bc1 * Did more work on trying to figure out why iniital setup of the database was failing. I believe it was because, in anvil-daemon, after calling 'prep_database' we called ->connect() _without_ 'check_if_configured' set. Next round of function testing should help confirm is this was the case.
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 20:41:29 -05:00
Digimer
090c59a873 Updated striker-prep-database to enable extra logging to help diagnose a function test build failure problem.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-16 03:21:33 -05:00
Digimer
257a998743 * Updated Database->configure_pgsql() to use 'postgresql-setup --initdb --unit postgresql' instead of the deprecaded 'initdb' switch.
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-28 12:07:36 -04:00
Digimer
63c45430bb * Updated scan-network to clear duplicate IP addresses.
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:27:54 -04:00
Digimer
0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds.
Removed the stub Server->provision method that was never used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 10:01:03 -04:00
Digimer
ea368a942b Finished the '--update' switch function in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 23:51:41 -04:00
Digimer
5ee7b2ccaf Got the '--connect' and '--disconnect' functions working in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 15:14:46 -04:00
Digimer
0fcde483be
Merge branch 'master' into anvil-tools-dev 2021-09-20 11:44:30 -04:00
Digimer
fbe9adc306 Resolve a words key conflict.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:42:47 -04:00
Digimer
5c07179aa6 * Resolved a words.xml conflict.
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:32:45 -04:00
Digimer
e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-19 14:06:18 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Madison Kelly
922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time.
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2021-09-16 23:10:55 -07:00
Digimer
6664c5b77f * Fixed a bug where scan-drbd, with DR configured, was not recording TCP ports assigned to connections properly.
* More bugs fixed in anvil-manage-dr, tested repeatedly as a job and so far, so good. Other functionality still to come.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 23:34:25 -04:00
Digimer
da9dc03d04 Updated anvil-manage-dr to update the job progress and convert prints into strings.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 15:14:03 -04:00
Digimer
5b35204af4 * Updated DRBD->get_next_resource() to take the new 'dr_tcp_ports' ports which, if set, returns two free TCP ports.
* Got anvil-manage-dr to the point where it writes the updated resource configuration to enable DR support. (untexted)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-09 23:07:03 -04:00
Digimer
9edf698c37 Updated Database->get_storage_group_data() to determine when a node or DR host needs to be removed from a Storage group, or when a member of an Anvil! needs to be added to a storage group.
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-08 00:50:45 -04:00
Tsu-ba-me
d195a53ba2 feat(cgi-bin): add endpoint for fetching server screenshot 2021-09-03 16:21:55 -04:00
Tsu-ba-me
1014299d38 fix(tools): enable anvil-get-server-screenshot to be a job 2021-09-03 16:21:55 -04:00
Digimer
2f8b1fb72e Updated anvil-provision-server so that when the OS type is 'win7', set the disk to sata and the NIC to e1000e. Also updated it to store the virt-install call in the 'variables' table and write it out to /mnt/shared/provision.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-31 00:32:15 -04:00
Digimer
06f679d7e7 * Added the ability to disable anvil-daemon management of /etc/hosts.
* Updated the OS variant list.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-30 22:40:35 -04:00
Digimer
4c7bb45ab9 Fixed a race condition where configuring the IPMI BMC would appear to fail because the BMC wouldn't report the user list after a cold reset.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-25 21:02:00 -04:00
Digimer
04cb116c1b Updated anvil-parse-fence-agents to validate each fence agent's metadata is valid before adding it to the unified XML.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-19 00:58:26 -04:00
Digimer
3674a47179 WIP - Working a tool to manually load updated server definition files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-16 11:39:16 -04:00
Digimer
aec22bb79c Added a check in scan-network that finds/removes duplicate network interface names.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-11 12:17:01 -04:00
Digimer
4800f7181f * Updated ScanCore to boot a node that is off without a stop reason.
* Fixed a bug where anvil-safe-stop was not recording the stop-reason. Also made '--poweroff' an alias for '--power-off'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-07 14:01:14 -04:00
Tsu-ba-me
2b6e1bb51b chore(share): add missing labels to error messages 2021-08-04 13:40:02 -04:00
Tsu-ba-me
bb155a5786 fix(tools): update job progress in catch-all case 2021-08-04 13:40:00 -04:00
Tsu-ba-me
1d61c8fff7 fix(cgi-bin): modify manage_vnc_pipes endpoint to trigger a job 2021-08-04 13:38:28 -04:00
Tsu-ba-me
7d9013a60b fix(tools): allow striker-manage-vnc-pipes to be executed as a job 2021-08-04 13:38:26 -04:00
Digimer
8d2e454d69 * Updated fence_delay to set the ownership of the log file to 'hacluster:haclient'. This should address https://github.com/digimer/fence_delay/issues/1
* WIP - COntinuing work on anvil-manage-server, far from done yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-26 15:35:04 -04:00
Digimer
bc8b9274cb WIP; Reworked anvil-manage-server to have a more interactive menu system (for the sections done so far).
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-26 15:35:04 -04:00
Digimer
28865780f8 * Updated Database->get_server_definitions() to take a specific server UUID, allowing just the one definition to be loaded. Also had it clear previous loads.
* Updated Server->parse_definition() to call DRBD->get_devices() so that referenced LVs can be loaded properly.
* Continued WIP in anvil-manage-server

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-20 23:19:29 -04:00
Digimer
623dbb0863 WIP; Restarted work on anvil-manage-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-18 16:21:00 -04:00
Digimer
cebae28716 * WIP - Fixing a bug in scan-network where vnet devices aren't being recorded against their bridge.
* Updated scan-server to record the VNC port it is using in the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-15 00:42:47 -04:00
Digimer
7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link.
* Fixed a bug in Cluster->parse_cib() where the local machine's ready state was being set to the node name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-14 12:17:19 -04:00
Digimer
fd5d3c0434 * Finished (though testing still needed) scan-network.
* Updated Alert-register so that, if 'sort_position' is not set (or set to 9999), an internal counter for each alert level is created and used so that alert entries sort naturally by the order they're registered.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-09 01:20:25 -04:00
Digimer
a697011b08 * Disabled debug logging in anvil-daemon.
* WIP - working on new scan-network scan agent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-30 02:36:06 -04:00
Tsu-ba-me
e8f18d7aa1 fix(cgi-bin): enable sending power on/off server VM jobs 2021-06-25 21:40:55 -04:00
Digimer
607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration.
* Updated DRBD->allow_two_primaries() to take the 'set_to' parameter which can be 'yes' to all and 'no' to disallow dual-primary.
* Updated ocf:alteeve:server to call allow_two_primaries() with 'set_to' = 'no' instead of calling 'adjust' after a migration completes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-25 00:45:52 -04:00
Digimer
d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers.
* Completely disabled Network->check_network(), it's causing more problems than it solves.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 14:19:58 -04:00
Digimer
30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues.
* Fixed a undefined variable warning.
* Removed a debugging die from Database->resync_databases().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 19:41:00 -04:00
Digimer
023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly.
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 16:12:12 -04:00
Digimer
b71ed28f64 * Added Cluster->manage_fence_delay() that reports back and, optionally, sets a preferred node in a fence race.
* Updated scan-cluster to check / set which node should be preferred if a netsplit causes a fence race.
* Fixed a bug in Server->shutdown_virsh() where a shutdown timeout would go into a loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-21 21:27:24 -04:00
Digimer
08a958ec60 * Finished updating Network->check_network() to check/heal bridges.
* Updated anvil-configure-host to not reboot on network chane (will verify when this commit is function tested).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 22:42:10 -04:00
Digimer
bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there.
* Added the 'print' parameter to Log->variables() to allow printing to STDOUT when set.
* Renamed Network->check_bonds() to Network->check_networks() in anticipation of adding bridge monitoring / repair to it later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 19:37:37 -04:00
Digimer
11b1900e1b Note: Continuing to resolve the build issues with network startup. Expect breakage.
* Upped the aging of jobs and alerts data from 2 to 24 hours. Also added a check to prevent deleting a job of any age that is incomplete.
* Major update to anvil-configure-host to not touch the network unless something has actually changed. Not yet tested on a fresh system, will verify nothing broke in the CI tests this commit will trigger. Also changed it so that, if after reconfiguring the network it times out trying to reconnect to a database, it calls a reboot instead of simply exiting. Further, a reboot is now not called on exit unless something changed to require it.
* Updated Network->check_bonds() to return '1' if anything was done to heal a bond.
* Updated anvil-update-states to be more careful about clearing virsh bridges. Specifically, it checks to see if virsh is running and that the returned bridges aren't actually error codes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-14 01:58:25 -04:00
Digimer
3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on.
* Updated anvil-daemon to call Network->check_bonds() with 'all' on startup, then woth 'down_only' once per minute to try to heal down'ed bonds.
* Updated anvil-watch-bonds to take a 'run-once' switch and exit after one report, if set.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 13:33:51 -04:00
Digimer
0b6a9e37fa * Added scan_lvm_pv_sector_size to the scan_lvm_pvs table in the scan-lvm. This will be used later for growing a requested disk size for the DRBD metadata.
* Added a 1 minute delay to anvil-configure-host before calling a reboot.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-11 19:57:30 -04:00
Digimer
5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 01:54:28 -04:00
Tsu-ba-me
9fda3af2ce fix(cgi-bin): move error string key to 0307 2021-06-04 19:00:34 -04:00
Digimer
3b1afa0d92
Merge branch 'master' into anvil-tools-dev 2021-06-03 22:32:40 -04:00
Digimer
24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time.
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-03 22:25:36 -04:00
Tsu-ba-me
419ec52d2b fix(cgi-bin): add job title and description to set_membership 2021-06-02 15:27:31 -04:00