Commit Graph

1088 Commits

Author SHA1 Message Date
digimer
a0cb791f47 This contains fixes needed for beta from additional testing.
* Updated the pcs wrapper to flock anything but status calls.
* Updated scan-apc-pdu to purge regardless of the host it's called on any host.
* Fixed a bug striker-purge-target that wouldn't purge anvil nodes in various cases.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-09 18:07:03 -04:00
digimer
cc71df686b Added a pcs wrapper to serialize pcs constraint calls.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-08 14:32:33 -04:00
Fabio M. Di Nitto
824e3e07e3 virsh: add wrapper to serialize calls to virsh list
avoid storm of virsh list that overloads libvirtd API causing
unnecessary timeouts during pcmk monitoring operations.

Resolves: https://github.com/ClusterLabs/anvil/issues/395

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-08-07 08:35:08 +02:00
digimer
6ee2ad75db * Updated anvil-delete-server to actively check for and delete any drbd-fenced attributes left over in the CIB after a server is deleted. This addresses issue #374.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 21:45:34 -04:00
digimer
ed480cf1cb * Fixed a double-$ bug in Remote->_check_known_hosts_for_target()
* Updated striker-update-cluster to take '--timeout' and a number of seconds, or 'Xm' or 'Xh' for minutes or hourse, respectively. Also updated to show the remaining time while waiting, and added waiting timeout to the rest of the while loops that prior had no time limit. This addresses issue #383 and issue #382.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 19:13:41 -04:00
digimer
0471fb90ea * Upped the logging in these three tools to help diagnose run errors. To be removed before tagging beta
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 13:07:38 -04:00
digimer
88e8978305 * Fixed a bug where getting the job_uuid after a no-db run wouldn't actually update the job progress.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-25 12:43:28 -04:00
digimer
be290bf561 This commit fixes a bug where the drbd kernel module build was being killed mid-compile, leaving DBRD unusable.
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 22:32:41 -04:00
digimer
d68adb5b4e * Updated anvil-manage-power to not reboot if anvil-version-changes is running (which, if it's taking time, is generating new kmods).
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 20:44:40 -04:00
digimer
8b3d472b9c Updated striker-update-cluster to set primary_host_uuid to node 1 if not returned from Cluster->get_primary_host_uuid.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 19:00:19 -04:00
digimer
556e91238d * Updated Network->find_access() to clear the data from previous scans, which fixes a bug where checking multiple hosts could return stale data for the previous host.
* Updated anvil-manage-server-storage, striker-collect-debug, and striker-update-cluster to be able to find a connection on an interface when none were found on preferred networks.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 15:43:54 -04:00
digimer
9a5e617a2d * Test fix for the issue #379
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-24 00:22:43 -04:00
digimer
f57ab1a78c * Updated anvil-daemon to not hold jobs at startup is the host isn't configured yet.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 23:34:39 -04:00
digimer
66c82e5e22 * Fixed a bug in anvil-update-system where updating a single package with --reboot wouldn't request a reboot. Finished reworking it so that a check is made to see if the kernel or DRBD kmod will be updated and, if so, removes the kmod-drbd RPMs prior to doing the update (as opposed to the sloppier check-on-error method).
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
digimer
e278de4b5a The main change in this commit deals with anvil-daemon startup. During OS updates, it would pick up the queued update job and run it while the other --no-db one was still running. This could become an issue for other tasks in the future, so updated anvil-daemon to not run any jobs for the first minute after startup. Also updated it to see if an OS update is underway (given how it can start mid-RPM update, before packages like kmod-drbd are ready to build). While doing this, implemented caching of daily tasks (like agine out data, archiving data, network scans, etc) to only run once per day, period. As it was before, they would always run on anvil-daemon startup, then wait 24 hours.
Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 21:43:26 -04:00
Tsu-ba-me
714ccdb5b6 chore(tools): log start/stop pipe errors in manage vnc pipe 2023-07-23 21:43:26 -04:00
Tsu-ba-me
1f13a416b9 chore(tools): remove unused anvil-manage-tunnel; missing from d2a61da 2023-07-23 21:43:26 -04:00
Tsu-ba-me
9163cf4513 chore: remove anvil-manage-tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
47f7c71e95 fix(tools): stop ws with source port when target port is unavailable 2023-07-23 21:43:26 -04:00
Tsu-ba-me
99dc4ba6ba fix(tools): handle possible remnant websockify daemon wrapper 2023-07-23 21:43:26 -04:00
Tsu-ba-me
ba335cc411 fix(tools): match server vncinfo variable name 2023-07-23 21:43:26 -04:00
Tsu-ba-me
25f5c38ade chore: rename striker-manage-vnc-pipes->anvil-manage-vnc-pipe 2023-07-23 21:43:26 -04:00
Tsu-ba-me
c29041d2f7 fix(tools): start websockify as daemon, re-find its pid and ports 2023-07-23 21:43:26 -04:00
Tsu-ba-me
cb98d28eb0 fix(tools): add target host to vnc info variable 2023-07-23 21:43:26 -04:00
Tsu-ba-me
0b91ee0314 fix(tools): remove all tunnel-related tasks 2023-07-23 21:43:26 -04:00
Tsu-ba-me
084394c66f fix(tools): hoist find server vnc port 2023-07-23 21:43:26 -04:00
Tsu-ba-me
ee091d4e7b fix(tools): format output of existing tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
d42f202609 fix(tools): flip ports based on forward type 2023-07-23 21:43:26 -04:00
Tsu-ba-me
0210323730 fix(tools): pass tunnel list to start, stop tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
19f6cefd8d fix(tools): isolate prepare tunnel parents 2023-07-23 21:43:26 -04:00
Tsu-ba-me
9f8a153fe0 fix(tools): replace tilda with home path 2023-07-23 21:43:26 -04:00
Tsu-ba-me
f8e65416c4 fix(tools): correct receiving find tunnels output 2023-07-23 21:43:26 -04:00
Tsu-ba-me
1854cf4872 fix(tools): enable search full command in find tunnel parent processes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
02f89b24b1 fix(tools): pass debug level to find, start, stop tunnel in start pipe 2023-07-23 21:43:26 -04:00
Tsu-ba-me
6dbec289a1 fix(tools): treat empty tunnel list as no tunnels 2023-07-23 21:43:26 -04:00
Tsu-ba-me
3debdb846d fix(tools): start background processes with system call 2023-07-23 21:43:26 -04:00
Tsu-ba-me
b695414c86 fix(tools): improve debug hashes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
8c37b49132 fix(tools): remove extra space in find tunnel parent sed 2023-07-23 21:43:26 -04:00
Tsu-ba-me
55bc44fc6e fix(tools): correct typos in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
bedbf576ab fix(tools): correct loop over tunnel list 2023-07-23 21:43:26 -04:00
Tsu-ba-me
4de0b675f1 fix(tools): don't find when tunnel list doesn't exist 2023-07-23 21:43:26 -04:00
Tsu-ba-me
2eb96f9d10 fix(tools): reuse existing tunnels in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
834e1a568a fix(tools): improve debug log of start processes in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
7baa52e37d fix(tools): ignore mismatches when find websockify, tunnel parent processes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
41abd4f9e4 fix(tools): correct websockify command in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
75b6ab94df fix(tools): correct reversed set, delete operations in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
6906551851 fix(tools): remove repeated UUIDv4 test in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
6ec2dea741 fix(tools): correct renamed call variable in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
b1e7b0e244 fix(tools): correct input to keys in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
9431d89b61 fix(tools): correct brackets of set_ws_process in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
393782cf83 fix(tools): log inputs in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
ff7fe8b3a3 fix(tools): add missing grep path in manage vnc pipes 2023-07-23 21:43:26 -04:00
Tsu-ba-me
d192356c5a fix(tools): remove unused database connection in manage tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
afade80f39 fix(tools): manage all VNC pipe components on subnodes/dr 2023-07-23 21:43:26 -04:00
Tsu-ba-me
ba467ccfa7 fix(tools): manage forward list of parent connection in manage tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
db06747513 fix(tools): make target optional when using external parent in manage tunnel 2023-07-23 21:43:26 -04:00
Tsu-ba-me
d29dac4fa9 fix(tools): return code after port forward fails in manage tunnel 2023-07-23 21:43:25 -04:00
Tsu-ba-me
711cb5b696 refactor: rename striker-open-ssh-tunnel->anvil-manage-tunnel 2023-07-23 21:43:25 -04:00
Tsu-ba-me
40e94cda46 fix(tools): enable open parent connection, child tunnel in open ssh tunnel 2023-07-23 21:43:25 -04:00
Tsu-ba-me
f2d3b06a10 fix(tools): remove all remote calls in manage vnc pipes 2023-07-23 21:43:25 -04:00
Tsu-ba-me
6c776e5a6a fix(tools): enable remote forward in open ssh tunnel 2023-07-23 21:43:25 -04:00
digimer
b24b81c17c Removed outer double-quotes from Anvil! node description in XML usage reporting. Related to issue #321.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 11:02:45 -04:00
digimer
942c4c94bf Escaped double-quotes in Anvil! node descriptions when reporting usage as XML format. Should resolve issue #321.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-23 11:00:59 -04:00
digimer
9b90647cc0 Fixed a bug where the XML output was not valid.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 22:28:25 -04:00
digimer
d07933a31c * Updated anvil-report-usage to accept the new '--machine' which reports the usage information in XML format.
* Added the anvil-report-usage.8 man page
* Updated anvil-update-system to enable scancore when the OS update is complete.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 22:21:48 -04:00
digimer
01b714f3b3 Fixed typo from issue #369.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 20:09:13 -04:00
digimer
b0c54b6dae * Updated anvil-update-system to check if another instance of anvil-update-system is running and, if so, exit.
* Removed the new tasks from anvil-special-operations.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 20:03:39 -04:00
digimer
7bd76c10dc Major thing in this commit is reworking striker-update-cluster to work without expecting anvil-daemon to be running on target machines. Similarly, they had to be able to work when the Striker DBs were not available. This is to account for cases where the Striker dashboards have updated, and the schema has changed, preventing the not-yet-updated DR hosts and subnodes from being able to use the DB. To do this, anvil-safe-stop, anvil-update-system, and anvil-shutdown-server had to be updated to use the new --no-db switch, which tells then to run without the database being available.
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-22 18:09:01 -04:00
digimer
541381e317 * Finished getting anvil-manage-server-storage to add new volumes to running servers.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-18 13:31:52 -04:00
digimer
afaf129733 * Updated anvil-manage-server-storage to connect the new drive to the VM. Still need to update the on-disk and in-DB definitions though.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-17 21:33:46 -04:00
digimer
de86cf88fe * Updated anvil-manage-server-storage to now handle a new volume stuck in 'Negotiating', and to do the initial sync when there are three connected peers.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-17 21:04:36 -04:00
digimer
9bc78860a6 * Updated anvil-update-system to detect kmod-drbd upgrade problems and fix them.
* Updated striker-update-cluster and anvil-update-system to take '--reboot' to request a reboot if any packages are updated.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-16 20:45:47 -04:00
digimer
f262da544d Removed '--best --allowerasing' from dnf update.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-16 00:18:29 -04:00
digimer
d741f4aa6f * Updated anvil-daemon to not exit on high RAM use is any job is running.
* Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job,
* Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 22:23:30 -04:00
digimer
751687129a * Updated anvil-daemon to not exit on RAM use if anvil-update-system is running.
* Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online.
* Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed.
* Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-15 16:23:38 -04:00
Digimer
c1e4380a64
Merge branch 'main' into anvil-tools-dev 2023-07-15 00:06:49 -04:00
digimer
02c3d204ea * Updated anvil-update-system to set 'job_data' to track reboots, and striker-update-cluster to read it.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 22:52:51 -04:00
digimer
3016fb875b * Reworded striker-update-cluster to use anvil-update-system for on-system OS updates.
* Updated DRBD->get_status() to take the new 'host' paramter to allow the caller to define the hash key string used in the stored data.
* Updated Get->anvil_version() (and a few other places) to use the new 'striker-ui-api' shell user, replacing the 'apache' user.
* Updated Remote->test_access() to take the new 'close' parameter to close the SSH session used when testing access to the target.
* Fixed a logging bug in anvil-manage-power.
* Updated anvil-update-system to take the '--no-reboot' and 'clear-cache' command line switches.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-14 22:29:07 -04:00
Tsu-ba-me
4f46bb43eb fix(tools): remove server screenshot fetching in anvil-daemon 2023-07-13 01:54:04 -04:00
Tsu-ba-me
b549ff2c1f fix(tools): reduce unnecessary operations in anvil-get-server-screenshot 2023-07-13 00:40:39 -04:00
Tsu-ba-me
4647062111 fix(tools): set script source in anvil-access-module 2023-07-12 18:24:21 -04:00
digimer
d56b7f9a84 * Created (but not finished!) the new striker-update-cluster tool.
* Updated Cluster->get_primary_host_uuid() to only load anvils if not already loaded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-07 17:54:57 -04:00
digimer
3215e178ef * Updated striker-collect-debug to support '--output-file /path/to/file.tar.bz2'.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-06 13:02:59 -04:00
digimer
a7ebe45f76 This adds the new 'striker-collect-debug' tool that collects all potentially useful debug info into a single tarball.
* Fixed a bug in Get->anvil_from_switch() to work when the Anvil! name is passed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-05 21:04:05 -04:00
Tsu-ba-me
d95eb699f9 chore: disable web VNC, screenshot pieces to avoid libvirt deadlock 2023-07-05 17:06:11 -04:00
Tsu-ba-me
54197a2f2c fix(tools): wrap guest name with quotes when get vncdisplay in manage vnc pipes 2023-07-03 04:46:07 -04:00
Tsu-ba-me
64093d42a0 fix(tools): allow pass libvirt domain XML info to manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d64e5ff17f chore(tools): hide open all components in manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d9d0244f3f docs(tools): identify most variable outputs in manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ce637cbf71 fix(tools): select 0/1 ws instance for given server 2023-07-03 04:46:06 -04:00
Tsu-ba-me
be82c6e267 fix(tools): print forward port after open SSH tunnel in manage vnc pipe 2023-07-03 04:46:06 -04:00
Tsu-ba-me
a48c6997fe fix(tools): include server host UUID when open VNC SSH tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ecaa38cfd1 fix(tools): add multiple repairs to manage-vnc-pipes
* ensure valid server UUID with pattern
* allow specify known server host UUID
* combine server UUID and server host UUID (a.k.a. ws host UUID) as
  unique record in table
* remove unnecessary checks for ws source port
2023-07-03 04:46:06 -04:00
Tsu-ba-me
b92627dd5d fix(tools): simplify kill logic in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
a7b2f7c9e1 fix(tools): pass server vnc port as flag in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
17bef8b415 fix(tools): allow manage-vnc-pipes to accept server name 2023-07-03 04:46:06 -04:00
Tsu-ba-me
324bbaf141 fix(tools): always end with nice exit in open-shh-tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
8da4033607 fix(tools): separate open/close websockify and ssh tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
9457986659 fix(tools): simplify accessing switches in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
bf0e75109f fix(tools): simplify selection between local/remote call in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d98df4b2a4 fix(tools): isolate non-striker tasks in anvil-daemon 2023-07-03 04:46:06 -04:00
Tsu-ba-me
560d60c7e8 fix(tools): get server screenshots every minute and punt to strikers WIP 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ffd41b1dfa fix(tools): enable anvil-get-server-screenshot to send screenshot to multiple hosts 2023-07-03 04:46:06 -04:00
digimer
bf1ccc8bee * Finally got the creation of new DRBD volumes under existing resources work!
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 22:36:27 -04:00
digimer
1b8b0bc493 * Created the new 'anvil-manage-server-storage' with the first role of reload a DRBD resource.
* Updated Remote->call() to remove the 'background' parameter as it wasn't working.
* Updated anvil-manage-server-storage to use 'anvil-manage-server-storage' to adjust resources in a way that doesn't block.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 21:02:30 -04:00
digimer
7fbed10864 * Updated Remote->call() to take the new 'background' parameter.
* Continues work on adding new disks (DRBD volumes) to anvil-manage-server-storage.
* Updated DRBD->get_status() to record the peer-role.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-29 22:17:58 -04:00
digimer
ea95d26cc5 * Fixed a bug in DRBD->get_next_resource() where reserved minor numbers were not being released. Also added a new parameter, "minor_only", that returns the next minor number but doesn't bother processing TCP ports.
* Did more work on adding support for adding new disk drives to servers in anvil-manage-server-storage.
* Updated anvil-manage-storage-groups To check for / delete duplicate storage groups with the same name.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-26 23:55:19 -04:00
digimer
88cc76914d This is an attempt to fix issue #341. It replaces the search for SN IPs from Network->find_matches() to Network->find_access(). The later of which doesn't care about the interface the IP was found on.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-24 21:24:37 -04:00
digimer
e0316da88b * Got anvil-manage-server-storage working enough to grow existing disk's hard drive sizes, and to insert/eject optical disks.
* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-23 23:09:55 -04:00
digimer
376660a120 * Removed the EXTRA_DIST argument from tools/Makefile.am
* Added a sanity check that a valid optical device was passed to anvil-manage-server-storage

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 21:20:10 -04:00
digimer
7a32d219fc Removed the old watch_drbd tool and added the new anvil-watch-drbd to the Makefiles.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:47:24 -04:00
digimer
1d12fb32b4 * Completed the new anvil-watch-drbd which replaces watch_drbd.
* Updated Email->get_current_server() to always load mail server data from the database.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:43:46 -04:00
digimer
336699a0f2 Added logging to help debug a DRBD resource config issue related to finding matching SN IPs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-21 10:29:44 -04:00
Digimer
8f491e01ed
Merge branch 'main' into anvil-tools-dev 2023-06-20 20:00:10 -04:00
digimer
0aa72498db * This adds the new tool 'striker-check-machines' which simply walks through all known physical machines and checks to see if they're accessible and powered on.
* Updated Get->uptime() to work on remote targets.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-20 19:57:21 -04:00
digimer-bot
82114120be
Merge branch 'main' into build-login 2023-06-20 10:47:38 -04:00
Tsu-ba-me
8a8b2cbc4b fix(tools): identify line(s) with UUID in interactive/script anvil-access-module 2023-06-20 00:48:21 -04:00
Tsu-ba-me
fe9c4a758f docs(tools): explain the interactive/script function of anvil-access-database 2023-06-20 00:48:21 -04:00
Tsu-ba-me
b494f79ffe fix(tools): anvil-access-module: default interactive, handle non-existing on class object 2023-06-20 00:48:21 -04:00
Tsu-ba-me
d9bc73ec2d feat(tools): add script capability to anvil-access-module 2023-06-20 00:48:21 -04:00
digimer
c9e11fbbfc * Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config.
* Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-19 21:44:45 -04:00
digimer
156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-16 11:43:49 -04:00
digimer
cc15eca6fb * Added anvil-watch-power to git.
* Added a check to cleanup size input to Convert->human_readable_to_bytes() when passed pre-processed strings.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:35:42 -04:00
digimer
47f7a35df3 The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation.
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:13:53 -04:00
digimer
38d088a998 * Added anvil-watch-power to the makefile.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 10:17:02 -04:00
digimer
b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1).
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-11 23:46:21 -04:00
digimer
5bb1c631cf * Updated anvil-delete-server to accept '--server' and '--force' to allow direct deletion of a server without interacting with the menu system.
This partially addresses issue #321.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 16:23:28 -04:00
digimer
bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state.
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 14:34:02 -04:00
digimer
284a2957d6 * Fixes issue #329; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference.
* Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations().
* Added a bit more logging for future debugging.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-08 15:03:29 -04:00
digimer
8f375c58a9 * Fixed a typo in anvil-daemon that prevented compiling.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 11:14:23 -04:00
digimer
110dceb55e * Added a check to make sure files were ready before provisioning a server.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 01:15:08 -04:00
digimer
c50a1936c0 * This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
digimer
1bba56a5b1 Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 10:54:51 -04:00
digimer
9a58f4d1ff * This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-30 19:47:58 -04:00
digimer
895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
c11be1ad1a Added a skip to ignore dot files when looking at new files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-19 12:36:05 -04:00
digimer
dc7b909bfc More logging to debug storage group race condition
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:14:59 -04:00
digimer
bd575c6a7d Bumped logging for storage group management.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:02:51 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
8ba613952c Typo fix.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 12:32:52 -04:00
digimer
83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
f086c1be39 Fixed a bug where the total RAM was shown instead of the free RAM.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 13:02:50 -04:00
digimer
fdf49c696f Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 12:23:21 -04:00
digimer
fb70836126 This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-12 22:26:15 -04:00
digimer
9bf0f50084 Added a check to see if the server's UUID exists and looping if not to prevent unitialized variable warnings.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-09 23:38:39 -04:00
digimer
1c274ba58d * Fixed a bug in anvil-delete-server that was preventing the complete deletion of a server if the DRBD resource had already been removed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-03 12:40:58 -04:00
digimer
ddc6965b60 * Fixed a bug where references to files on Anvil! nodes was broken in anvil-provision-server and anvil-manage-files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 17:33:49 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
digimer
8ff40ec42c * Fixed a SQL query bug in Database->get_drbd_data().
* Got more work done on anvil-manage-server-storage; Now shows DRBD resource size, backing LV and size, and calculates/displayes metadata size.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-26 02:09:52 -04:00
digimer
040bc02e26 * This adds the new Database->get_drbd_data() that, like ->get_lvm_data, collates the DRBD data collected by scan-drbd into more readibly parsable data structure.
* Updated DRBD->parse_resource() to add references to a resource name and volume for a given backing disk.
* Comtinued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-24 19:45:47 -04:00
Fabio M. Di Nitto
b75512e540 virt-install should not --wait on VM to be provisioned
Resolves: https://github.com/ClusterLabs/anvil/issues/277

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-03-24 01:27:15 -04:00
digimer
8e0e51544c * Continued work on anvil-manage-server-storage.
* Created the new Database->get_lvm_data to compile LVM data from scan-lvm
* Updated DRBD->parse_resource to call Database->get_lvm_data if needed, and to track backing devices to Storage Groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-22 22:57:26 -04:00
digimer
b144976853 This resolves Issue #310.
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-20 23:43:40 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
147f31aeeb * Added a loop when calling 'anvil-change-password' in a loop as there appears to be an unknown condition where during setup, this is called but never actually runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 18:37:13 -05:00
digimer
ab3e8afe6e Fixed a bug in Storage->push_file() where file path wasn't updated from incoming to files, preventing the push to other hosts from working. Also fixed a minor issue where the file size was sometimes 0, making transfer calculations useless.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 13:21:29 -05:00
Digimer
d59034a488
Merge branch 'main' into anvil-tools-dev 2023-02-22 02:21:50 -05:00
digimer
254f7ef4e2 This should fix the tracking of what files belong where, using the new DR links system. It also should finish (though testing is still needed) the serial rsync issue.
* Created Database->track_files() as a dedicated method as trying to verify the existence of file_locations during Database->load_anvils() was fragile and prone to recursive loops.
* Updated Database->insert_or_update_file_locations() to take an anvil_uuid and recursively call for each host, to maintain compatibility with the old ways, and make it simpler to add an entry for both sub-nodes in an Anvil!.
* Created Storage->push_file() that takes a file and rsync's it to all other machines, or creates a job for the file to be pulled if the target can't be accessed.
* Updated anvil-manage-files and anvil-sync-shared to use the new Storage->push_files and Database->track_files methods.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 02:13:19 -05:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
digimer
7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks.
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-03 22:05:34 -05:00
Tsu-ba-me
58d09cb08c fix(tools): make server screenshot write to named pipe non-blocking 2023-02-02 23:18:26 -05:00
Tsu-ba-me
eb561d6d39 fix(tools): always send to pipe when given request host 2023-02-02 19:21:56 -05:00
Tsu-ba-me
3802c72912 fix(tools): check server state before getting screenshot 2023-02-02 18:25:33 -05:00
Tsu-ba-me
a9cc123300 fix(tools): exit at end of anvil-get-server-screenshot 2023-02-02 17:11:56 -05:00
digimer
9751c883cb * Updated Cluster->assemble_storage_groups() to remove refrences to anvil_dr1_host_uuid. Also added the logic for auto-adding DR host's VGs to a storage group. Commented it out though as, for now, this might be a bad idea. Needs more thought.
* Fixed a bug in Database->get_storage_group_data() to load hosts data when needed. Also fixed a bug where new members didn't return the new storage_group_member_uuid.
* Updated anvil-manage-host to use the new switch handler.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-01 23:19:38 -05:00
digimer
7773e5f9b8 * Updated logging in DRBD->get_devices().
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-30 11:30:36 -05:00
digimer
56cf100b09 * Added a check to ensure a storage group actually exists before trying to present it to the user. This should resolve issue #299.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-22 11:08:55 -05:00
digimer
695b274d78 * Fixed a bug in anvil-provision-server wasn't loading the available OS list when provisioning servers. The should resolve issue #296.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 23:47:04 -05:00
digimer
053e5312e1 * Fixed a bug in anvil-manage-dr where protect jobs with multiple potential targets wouldn't know which to use during job runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:51:21 -05:00
digimer
e9f390b65b * Udated RPM spec to add new core requires and add calling 'anvil-version-changes' to core's %post.
* Added missing man pages and the new anvil-manage-storage-groups to the Makefile.am's.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:24:24 -05:00
digimer
e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups.
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:10:15 -05:00
digimer
355e5c2c0a * More work done on anvil-manage-dr. It now properly validated a dr host.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 00:11:35 -05:00
digimer
f8743a7435 * Further work on anvil-manage-dr. Now properly sanity checks that a valid server is passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-19 22:14:17 -05:00
digimer
1a217d21cf * Updated anvil-manage-dr to provide the ability to link anvil nodes to dr hosts. Also began work on making it work with the new DR links system.
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-19 19:41:02 -05:00
digimer
16fc4e131c * Fixed a bug where, if a specific request to do a DB resync was made but the active_uuid wasn't matching the host, it wouldn't resync. This broke peering Strikers when the peer source was not the active_uuid.
* Updated anvil-manage-dr to check and delete duplicate dr_link entries.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 22:53:15 -05:00
digimer
985338a064 Fixed typo that broke compilation.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 21:15:09 -05:00
digimer
98c3868870 * Updated fence_pacemaker to no longer use stonith_admin and instead use pcs. This should resolve the main part of issue #279
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 00:22:03 -05:00
digimer
0318b4bbe9 * Fixed (the very incomplete) anvil-manage-firewall so that it would clear a job, if a job was assigned to it.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-17 22:52:57 -05:00
digimer
ff69916a85 * Applied typo fixed from PR #286 (thanks, Deezzir!). Also moved all the raw prints into words.xml.
* Updated Convert->human_readable_to_bytes() to return an empty string if passed an empty string.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-16 20:23:29 -05:00
digimer
64bb5ab8e1 * Updated striker to only complain about unconfigured networks on nodes, not DR hosts.
* Updated anvil-configure-host to ignore gracefully unconfigured networks.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:41:55 -05:00
digimer
b8b4352117 * Added support for Migration Network configs in old striker and anvil-configure-host
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:24:26 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
digimer
192cee090b * Removed an unused code block.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:02:32 -05:00
digimer
b666caec64 * Updated anvil-provision-server to handle startup when the peer doesn't create/connect it's DRBD resource (ie: node is offline).
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:00:38 -05:00
digimer
a5cee52153 * Fixed a bug in DRBD->get_devices() where old test host UUIDs were left hard-coded.
* Fixed a duplicate header in words.xml
* Fixed display bugs in anvil-report-usage and removed the old DR host display info.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 22:58:28 -05:00
digimer
65a483273e * Updated anvil-version-changes to connect to the database with 'sensitive' so that the connection is unlikely to fail if schema changes are needed for normal operation.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 11:54:23 -05:00
Digimer
4d5dd8c6fa * Finished adding support for manually selecting a network with --network in anvil-provision-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-26 12:48:25 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
9194eb3d09 * Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost).
* Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-15 19:28:00 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
561fa1a9ec
Merge branch 'main' into anvil-tools-dev 2022-12-13 01:21:40 -05:00
Digimer
33b4516dea Fix a variable quoting bug in Database->locking().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-07 18:52:51 -05:00
Digimer
4fa8d7a446 * This completes the rework of DRBD triggered fencing to use / clear location constraints instead of triggering a power fence.
* Added the new unfence_pacemaker DRBD unfence handler.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-30 16:13:38 -05:00
Digimer
4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence.
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.

Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.

Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-29 22:17:12 -05:00
Tsu-ba-me
9d418b276a build(tools): remove renamed striker-access-database script from Makefile 2022-11-29 14:39:40 -05:00
Tsu-ba-me
44acfe3e28 docs(tools): add in-script documentation to anvil-access-module 2022-11-28 20:09:41 -05:00
Tsu-ba-me
2e5edfdcf0 fix(tools): return complete subroutine results in anvil-access-module 2022-11-28 14:37:21 -05:00
Tsu-ba-me
0284434815 fix(tools): allow subroutine execution before reading $anvil->data 2022-11-28 14:37:19 -05:00
Tsu-ba-me
e988dcedde fix(tools): expose $anvil->data given specific target structure 2022-11-28 14:37:19 -05:00
Tsu-ba-me
809a7e2951 fix(tools): add anvil-access-module switch to output $anvil->data 2022-11-28 14:37:19 -05:00
Tsu-ba-me
a7b80b2e36 fix(tools): parse switches in anvil-configure-host 2022-11-28 14:37:19 -05:00
Tsu-ba-me
bb02d556d4 fix(tools): add output file id switch to anvil-get-server-screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
e5fc75f306 fix(tools): fetch and send server screenshot from node to striker that made the request 2022-11-28 14:37:18 -05:00
Tsu-ba-me
e14b1fc93e fix(tools): use absolute paths in anvil-get-server-screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
4b03be4bc3 fix(tools): restrict get server screenshot output to stdout 2022-11-28 14:37:18 -05:00
Tsu-ba-me
2c1f400222 fix(tools): avoid using undef resize args when getting server screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
7b14433588 fix(tools): always convert server screenshot to PNG 2022-11-28 14:37:18 -05:00
Tsu-ba-me
374f88acb7 fix(tools): use --quiet when getting server screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
a7a2cc70d7 fix(tools): striker-access-database->anvil->access->module; execute any sub on any module 2022-11-28 14:37:17 -05:00
Digimer
6eb99a2168 * FInished the anvil-manage-alerts tool. It can now send test alerts at a user-requested alert level.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 01:10:53 -05:00
Digimer
8b7a44cf75 * Finished cleaning up the output of Machines.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 00:19:00 -05:00
Digimer
3e53c87a6b Formatted the output of anvil-manage-alerts data (not yet machines) to be more presentable.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 23:28:50 -05:00
Digimer
622fb84652 * Renamed the 'notifications' table to 'alert-override', better reflecting what it does.
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 00:34:52 -05:00
Digimer
586ce6e5b9 * Got recipints working in anvil-manage-alerts().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-15 22:17:12 -05:00
Digimer
35cf0c37fb * Updated System->check_ram_use() to set the maximum RAM based on the host type, and set those values in _set_default() so that the user can override if they want.
* Got anvil-manage-alerts to the point where you can add, edit and delete mail servers.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-14 17:17:30 -05:00
Digimer
1fba964a24
Merge branch 'main' into install-striker-access-db 2022-10-28 22:36:41 -04:00
Digimer
a6cd5c6604 * Starting work in the new anvil-manage-alerts, which will (when done), allow for management of mail servers, alert recipients, notification over-rides and to trigger test alerts.
* Updated Database->get_recipients() to record recipients by name for better sorting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-28 20:00:53 -04:00
Tsu-ba-me
0e2f119fef build(tools): add striker-access-db to tools/Makefile.am 2022-10-24 16:32:49 -04:00
Digimer
bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster.
* Created Get->anvil_from_switch and Get->server_from_switch() (both need testing) that takes a string that could be either a name or UUID, figures out which it is, finds the entry in the DB and started the X_uuid and X_name switch variables.
* Started work on a second attempt at anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-20 22:33:41 -04:00
Digimer
93427a7a38 * Updated Get->switches() to always support job-uuid.
* Updated striker-initialize-host to support calls from command line switches, and wrote the man page for it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 19:16:32 -04:00
Digimer
c23c79cdf0 Added 'system::all::configured' to anvil-join-anvil to mark an explicit end of config.
Started updating striker-initialize-host to handle the new anvil repo config.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 10:56:58 -04:00
Digimer
596855405f * Added variables to record when pacemaker and DRBD are configured.
* Added verify-alg to DRBD configs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-17 21:57:00 -04:00
Digimer
13b0f5bdcc Bumped 'Exhaust Temp' jump threshold to 30c in scan-ipmitool.
Adjusted some logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-08 20:34:09 -04:00
Digimer
03f0cdad84 Updated anvil-manage-files to only remove files from /mnt/shared/files
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-05 20:56:57 -04:00
Digimer
a4ef93404c * Fixed a bug in DRBD->gather_data() to remove trailing commas for existing TCP ports.
* Added the missing 'clear-mapping' switch to Get->switches in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-05 20:15:32 -04:00
Digimer
3b721b849c * Fixed a bug in anvil-configure-host where if the same MAC address was assigned to two interfaces, it would cause an endless reboot loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-28 19:20:23 -04:00
Digimer
ac8135709a Fixed a bug where scan-server faulted with a divide by zero error when the host had no swap.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-27 00:40:30 -04:00
Digimer
599373816f * Fixed bugs that came up in testing. Was now able to setup long-throw DR!
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-22 16:40:40 -04:00
Digimer
2fab7bc1b7 This adds support (testing needed) for "Long-Throw" DR; which is a wrapper for using 'drbd-proxy' to provide larger transmit buffers so slow/high-latency DR hosts.
* Created DRBD->check_proxy_license() to do (some level of) sanity checks on the DRBD proxy license file.
* Updated DRBD->gather_data() to parse out the inside and outside ports for resource configs using proxy.
* Reworked DRBD->get_next_resource() to return 1, 3 or 7 TCP ports depending, with the new long_throw_ports parameter triggering the 7 ports.
* Added 'tcpdump' to the anvil-core requires list.
* Reworked scan-drbd to record the ports used in proxy configs. This required adding a check to change the 'scan_drbd_peer_tcp_port' column type to 'text' to support CSVs.
* Reworked anvil-manage-dr (needs testing!) to support "long-throw" DR configs.
* Updated anvil-safe-stop to check if the nodes are in the cluster before trying to migrate.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-21 23:35:06 -04:00
Digimer
c8ee75420d * Updated anvil-manage-dr to check if a server is protected before processing a --connect or --disconnect request. Also made it smarter if an attempt to connect a resource fails.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-01 16:09:37 -04:00
Digimer
e90dae96f7 * In Server->shutdown_virsh(), disabled trying to resume a paused VM. Also updated the logging around not waiting for a VM to stop.
* Updated anvil-safe-stop to check for VMs running, even if the cluster is stopped, when --stop-servers is used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-31 18:12:07 -04:00
Digimer
99a6593fe6 * Fixed a bug when connecting to databases when one DB has no variable entries, making it seem like a DB was disabled.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-25 21:43:21 -04:00
Digimer
9675ebf986 * Added --remove support to anvil-manage-dr, completing all the features for this tool.
* Updated DRBD.pm to move the logic to wipe and delete an LV into a new method called 'remove_backing_lv'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-24 22:08:48 -04:00
Digimer
93e6a59841 * Added 'vnc-server' to the list of firewall services enabled on strikers.
* Created the anvil-manage-dr man page.
* Reworked anvil-manage-dr's --protect logic to search for which network works with the DR host, instead of assuming it's the SN.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-22 13:38:46 -04:00
Digimer
29a28ee97a * Fixed a bug with anvil-provision-server where running the command line menu from a Striker would not assign the job to the target Anvil!.
* Updated Server->parse_definition() to check if a failed 'virsh list' output was passed in. Also changed it to not exit if the XML can't be parsed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-16 19:01:36 -04:00
Digimer
cbb441759e * Fixed a couple bugs in anvil-manage-files where a file moved from incoming to files or definitions wasn't having the directory updated properly in the database. Also made an explicit check when looking for missing files to check to see if the file exists in another managed directory and, if so and if a striker, update the DB.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-15 23:27:40 -04:00
Digimer
7b1771e498 Updated anvil-provision-server to wait until the local machine is a full cluster member before proceeding.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-15 13:59:35 -04:00
Digimer
4ecc6097d3 * Cleaned up some old 'die' calls with better nice_exit() calls to help avoid dangling db_in_use flags.
* Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge.
* Added 'test' messages to Words->string().
* Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-12 16:32:20 -04:00
Digimer
ef3ac86162 * Fixed a bug where setting the db_in_use flag without a valid $ENV{_}.
* Added a nice_exit call to tools/striker-access-database

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 15:45:10 -04:00
Digimer
21738ab0d4 Added a bit more logging to the Database->mark_active method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 01:02:59 -04:00
Digimer
a81478f2bc * Updated 'db_in_use' state to add the caller's name to the state name. This is pulled out when logging stale locks that are being reaped, to help debug where stale locks are coming from.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:29:03 -04:00
Digimer
e7cf8ac789 * Got more work done on anvil-manage-files. It now picks up new files on nodes/dr hosts in an Anvil! and downloads them if needed.
* Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:08:19 -04:00
Digimer
be84a23924 * There were still references in anvil-manage-files to 'file_locations' -> 'file_location_host_uuid'. Had to rework some logic to get things working. More testing needed, but so far at least the "missing file" function is working again.
* Added missing always-available switchs in Get->switches
* Create Storage->_wait_if_changing() to check to see if a file's size is changing and, if so, not return until it stops.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-08 21:31:56 -04:00
Digimer
15aadc3a4e * Updated scan-network to check for inactive or activating interfaces and manually bring them up, if the uptime is less than 10 minutes.
* Fixed a bug in scancore-agents/Makefile.am where scan-network was missing.
* Started work on anvil-delete-server.8. Incomplete at this time.
* Updated Network->get_ips() to record the interface status.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-03 23:38:56 -04:00
Digimer
55dd28e7f1 * Added the anvil-configure-host man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 22:38:04 -04:00
Digimer
7eff8f0801 * Added the man page for anvil-check-memory
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 20:26:54 -04:00
Digimer
5fea8ff46a * Adds the anvil-boot-server man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 19:09:57 -04:00
Digimer
d8f31d9d84 * Added the anvil-boot-server man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 17:25:28 -04:00
Digimer
b3b185a43c * Added the alteeve-repo-setup man page and updated it to show that when called with '-h'.
* Updated scancore to use the new Get->switches() list parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 14:31:46 -04:00