Commit Graph

115 Commits

Author SHA1 Message Date
digimer
3d50f45984 Added a 1 second delay to nmcli calls
* Also fixed a bug Database->get_storage_group_data() to add a missing
  column to adding members.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-24 14:20:41 -04:00
digimer
8e53993f67 Shortened the anvil-daemon job start up delay.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-15 23:00:31 -04:00
digimer
6826b12188 Added a start for configured interfaces found to be down after boot.
* Added the 'up' parameter to Network->collect_data() that will bring up
  an interface we configured that is down.
* Updated scan-network to call Network->collect_data() with 'up' if the
  uptime is less than ten minutes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 13:25:56 -04:00
digimer
6d121dc0c0 Mapped each interface name in match.interface-name to a UUID lookup.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-05-08 00:17:16 -04:00
digimer
bd2e4c46ae Updated Network->load_ips() to use the device_name when available.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 16:55:18 -04:00
digimer
9999d6f522 Fixed a bug where nics were not being found by their NM device name
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-04-13 01:54:09 -04:00
digimer
ad0a353a89 Fixed a bug where unused interfaces were not being ignored.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-25 22:41:34 -04:00
digimer
870c990632 Added support for multiple IP's per interface
* Created Database->get_mac_to_ip()
* Updated Database->insert_or_update_mac_to_ip() to find an entry using
  both the IP and MAC address.
* Updated Network->get_ips() to store only the first IP it finds on an
  interface as the main IP (for use in /etc/hosts, etc) and to store it
  and any other IPs in a new hash.
* Updated scan-network to use the new hash above to record them in the
  'mac_to_ip' table. Similarly, before marking an IP as removed, it
  checks to see if it's an alternate IP.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-03-06 19:06:05 -05:00
digimer
ab0b1a262b Reworked Network->wait_for_bonds() to be ->wait_for_networks()
* Renamed the old ->wait_for_networks() to be ->wait_for_nm_online().
* The new ->wait_for_networks() waits for all interfaces we manage to be
  'activated' before returning.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-29 01:32:32 -05:00
digimer
fe2806b5df Bumped up timeouts
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-28 23:38:50 -05:00
digimer
480745c889 Disabled printing of the countdown.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-28 16:37:31 -05:00
digimer
495cb90ca6 Created Network->wait_for_network to hold startup for NM to be up.
Added the call to Network->wait_for_network to pause scancore and
anvil-daemon startups until NetworkManager says it's up and running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-24 17:16:46 -05:00
digimer
b3d1e53623 Added a log message if waiting for bonds times out.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-23 19:15:50 -05:00
digimer
6a193bf710 Added extra checks to Network->wait_for_bonds()
* Added a default timeout of 180 seconds, and updated
  anvil-configure-host to reduce this to 60 seconds while configuring
  the host.
* Added a check for interfaces configured under a bond. If none are
  found, the bond is ignored.
* Updated Storage->update_config() to take the new 'append' attribute to
  allow adding a variable if it wasn't found already in the config.
* Added the new 'network::wait_for_bonds::timeout' variable to enable
  changing the default timeout for Network->wait_for_bonds().

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 23:46:49 -05:00
digimer
d5ceca3dc6 Added a message when holding on a bond to activate.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:06:44 -05:00
digimer
05de34c7bc Scancore and anvil-daemon now holds for bonds to be up.
Created Network->wait_for_bonds(), and added it to the startup for
scancore and anvil-daemon.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-02-22 02:01:33 -05:00
Tsu-ba-me
2306776c37 fix: rename load_interfces->load_interfaces of Network.pm 2024-01-30 16:16:14 -05:00
digimer
943bf2e8d3 Removed the no-longer-needed Network->check_network() method
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
827cf1f331 Fixed a bug that was crashing anvil-daemon
* Network->find_matches() was trying to compare two IPs when the second
  IP wasn't actually defined.
* Disabled scancore's blocking of running before the host is configured.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
282fdbe7e0 Fixed a bug where IPs were being marked repeatedly as DELETEd.
* Database->get_ip_addresses() was marking IPs that weren't on a network
  we managed, the IP would be marked as DELETEd, which caused problems
  with initializing targets, and it generated a lot of repeat alerts.
* Updated logging in Network.pm to help with debugging.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
d7aa7966dc Fixed a couple bugs
* Network->collect_data() wasn't deleting old data before rescans.
* anvil-configure-host wasn't checking links that should be in a bond if
  the bond already existed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ff0e6c3575 Updated anvil-daemon to call scan-network if no interfaces exist.
Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
518fddfa82 More progress on the new NM version of anvil-configure-host
* It's technically done, but I know bugs remain.
* Updated Jobs->update_progress() to take 'file' and 'line' to make it
  easier in the logs to see the origin of the message, when logging the
  update.
* Created Network->modify_connection() to update network manager
  variables. Created ->reset_connection() to take an interface down and
  bring it back up again.
* Fixed a bug in scan-network where the device_to_uuid hash wasn't being
  stored.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
83057d0b45 Fixed several bugs around renaming interfaces
* Also fixed problems with scan-network related to the new network
  naming / NM system.
* Updated Database->insert_or_update_network_interfaces() to better
  search for a network_interface_uuid when not specified.
* Updated Network->collect_data() to take the new 'start' parameter
  which, when set, brings up unconfigured connections/devices.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
71735947dc Created Job->bump_progress() to make advancing job progress easier
* Updated Network->collect_data() to find the GENERAL.DEVICES and
  GENERAL.IP-IFACE from match.interface-name when the link is down.
* More work done on anvil-configure-host.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
9c67b97fdd Fixed a bug in initializing DROP'ed DBs.
* Got more work done on adding network_interfaces to the database in
  scan-server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
ec11335197 Fixed DB initialization bugs.
* More work done on the new network stack also.

Signed-off-by: digimer <mkelly@alteeve.ca>
2024-01-27 15:39:01 -05:00
digimer
201cd53265 Improved the logic behind Network->find_target_ip()
* This adds the new 'networks' and 'test_access' parameters to allow
  restricting/ordering matched networks, and adds 'test_access' to
  validate the link is working.
* Continued work on anvil-manage-server-system

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-11 17:22:06 -04:00
digimer
580980717d This commit covers the convertion of 'virsh' shell calls to using 'Sys::Virt' module, and fixes several small bugs related to scan-server;
* Switched all calls to virsh to use Sys::Virt to deal with contention of simultaneous virsh calls.
* Removed collecting screenshots from scan-server.
* Fixed a bad variable substitution in an alert.
* Fixed a bug where a server's boot time wasn't being recorded properly.
* Reworked how we determine which server definition was most recently updated and propogated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-09-21 15:59:43 -04:00
digimer
556e91238d * Updated Network->find_access() to clear the data from previous scans, which fixes a bug where checking multiple hosts could return stale data for the previous host.
* Updated anvil-manage-server-storage, striker-collect-debug, and striker-update-cluster to be able to find a connection on an interface when none were found on preferred networks.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 15:43:54 -04:00
digimer
458cb267da * Fixed a bug in Cluster->get_primary_host_uuid() where servers were not loaded before trying to calculate RAM use.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 00:04:12 -04:00
digimer
4dc1b0e117 * Added a check to Network->get_company_from_mac() to manually set the company to KVM/qemu if the prefix is 52:54:00.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 23:00:16 -04:00
digimer
3016fb875b * Reworded striker-update-cluster to use anvil-update-system for on-system OS updates.
* Updated DRBD->get_status() to take the new 'host' paramter to allow the caller to define the hash key string used in the stored data.
* Updated Get->anvil_version() (and a few other places) to use the new 'striker-ui-api' shell user, replacing the 'apache' user.
* Updated Remote->test_access() to take the new 'close' parameter to close the SSH session used when testing access to the target.
* Fixed a logging bug in anvil-manage-power.
* Updated anvil-update-system to take the '--no-reboot' and 'clear-cache' command line switches.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 22:29:07 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
7891c9b2b1 * Fixed a bug in Network->load_ips() where interfaces were being marked as type 'bridge' or 'bond'.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-27 12:31:33 -05:00
digimer
5dbdd20d7e * Fixed a bug in Network->load_ips() where the IP address on a bridge or bond was having the device name recorded incorrectly.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-23 00:39:43 -05:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
Digimer
93e6a59841 * Added 'vnc-server' to the list of firewall services enabled on strikers.
* Created the anvil-manage-dr man page.
* Reworked anvil-manage-dr's --protect logic to search for which network works with the DR host, instead of assuming it's the SN.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-22 13:38:46 -04:00
Digimer
4ecc6097d3 * Cleaned up some old 'die' calls with better nice_exit() calls to help avoid dangling db_in_use flags.
* Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge.
* Added 'test' messages to Words->string().
* Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-12 16:32:20 -04:00
Digimer
ddd28de112 * Fixed a couple typos that broke compilation.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-04 09:45:34 -04:00
Digimer
15aadc3a4e * Updated scan-network to check for inactive or activating interfaces and manually bring them up, if the uptime is less than 10 minutes.
* Fixed a bug in scancore-agents/Makefile.am where scan-network was missing.
* Started work on anvil-delete-server.8. Incomplete at this time.
* Updated Network->get_ips() to record the interface status.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-03 23:38:56 -04:00
Digimer
cf8198ac9a Fixed a typo causing a compilation error.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-20 15:05:27 -04:00
Digimer
3343ecaf9f * Added check to disable/stop firewalld if running when anvil-daemon starts.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-19 21:21:48 -04:00
Digimer
7fd6185445 * Disabled firewalling for now. There appears to be an issue starting up with DRBD.
* Updated Convert->time() to return whatever was passed in instead of '#!error!#'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-09 19:46:38 -04:00
Digimer
0b3d282a2c * Updated Network->manage_firewall() to restore a debug level set to 1 for testing.
* Added firewall config rules for DR hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-02 18:23:20 -04:00
Digimer
bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work.
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-02 17:06:04 -04:00
Digimer
9028611afb * Fixed a bug in Get->bridges were the bridge was not marked as found when parsing json output, breaking ocf:alteeve:server
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 14:09:12 -04:00
Digimer
f55d605270 * Disabled the new firewall management to prepare for a new merge with main. Needed to resolve in-field client issues.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 10:04:55 -04:00
Digimer
b2ea4f9adc * Moved System->manage_firewall() to Network->manage_firewall(). Started working on actually implementing it, which involves basically fully rewritting it.
* Updated tools/Makefile.am and scancore-agents/Makefile.am to add missing files.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 00:01:50 -04:00
Digimer
24f5d39dff This is a set of changes all stemming from trying to debug frequent resyncs. More bugs still to be fixed.
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-12 23:14:49 -04:00