Commit Graph

1057 Commits

Author SHA1 Message Date
digimer
d56b7f9a84 * Created (but not finished!) the new striker-update-cluster tool.
* Updated Cluster->get_primary_host_uuid() to only load anvils if not already loaded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-07 17:54:57 -04:00
digimer
3215e178ef * Updated striker-collect-debug to support '--output-file /path/to/file.tar.bz2'.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-06 13:02:59 -04:00
digimer
a7ebe45f76 This adds the new 'striker-collect-debug' tool that collects all potentially useful debug info into a single tarball.
* Fixed a bug in Get->anvil_from_switch() to work when the Anvil! name is passed.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-07-05 21:04:05 -04:00
Tsu-ba-me
d95eb699f9 chore: disable web VNC, screenshot pieces to avoid libvirt deadlock 2023-07-05 17:06:11 -04:00
Tsu-ba-me
54197a2f2c fix(tools): wrap guest name with quotes when get vncdisplay in manage vnc pipes 2023-07-03 04:46:07 -04:00
Tsu-ba-me
64093d42a0 fix(tools): allow pass libvirt domain XML info to manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d64e5ff17f chore(tools): hide open all components in manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d9d0244f3f docs(tools): identify most variable outputs in manage vnc pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ce637cbf71 fix(tools): select 0/1 ws instance for given server 2023-07-03 04:46:06 -04:00
Tsu-ba-me
be82c6e267 fix(tools): print forward port after open SSH tunnel in manage vnc pipe 2023-07-03 04:46:06 -04:00
Tsu-ba-me
a48c6997fe fix(tools): include server host UUID when open VNC SSH tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ecaa38cfd1 fix(tools): add multiple repairs to manage-vnc-pipes
* ensure valid server UUID with pattern
* allow specify known server host UUID
* combine server UUID and server host UUID (a.k.a. ws host UUID) as
  unique record in table
* remove unnecessary checks for ws source port
2023-07-03 04:46:06 -04:00
Tsu-ba-me
b92627dd5d fix(tools): simplify kill logic in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
a7b2f7c9e1 fix(tools): pass server vnc port as flag in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
17bef8b415 fix(tools): allow manage-vnc-pipes to accept server name 2023-07-03 04:46:06 -04:00
Tsu-ba-me
324bbaf141 fix(tools): always end with nice exit in open-shh-tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
8da4033607 fix(tools): separate open/close websockify and ssh tunnel 2023-07-03 04:46:06 -04:00
Tsu-ba-me
9457986659 fix(tools): simplify accessing switches in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
bf0e75109f fix(tools): simplify selection between local/remote call in manage-vnc-pipes 2023-07-03 04:46:06 -04:00
Tsu-ba-me
d98df4b2a4 fix(tools): isolate non-striker tasks in anvil-daemon 2023-07-03 04:46:06 -04:00
Tsu-ba-me
560d60c7e8 fix(tools): get server screenshots every minute and punt to strikers WIP 2023-07-03 04:46:06 -04:00
Tsu-ba-me
ffd41b1dfa fix(tools): enable anvil-get-server-screenshot to send screenshot to multiple hosts 2023-07-03 04:46:06 -04:00
digimer
bf1ccc8bee * Finally got the creation of new DRBD volumes under existing resources work!
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 22:36:27 -04:00
digimer
1b8b0bc493 * Created the new 'anvil-manage-server-storage' with the first role of reload a DRBD resource.
* Updated Remote->call() to remove the 'background' parameter as it wasn't working.
* Updated anvil-manage-server-storage to use 'anvil-manage-server-storage' to adjust resources in a way that doesn't block.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-30 21:02:30 -04:00
digimer
7fbed10864 * Updated Remote->call() to take the new 'background' parameter.
* Continues work on adding new disks (DRBD volumes) to anvil-manage-server-storage.
* Updated DRBD->get_status() to record the peer-role.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-29 22:17:58 -04:00
digimer
ea95d26cc5 * Fixed a bug in DRBD->get_next_resource() where reserved minor numbers were not being released. Also added a new parameter, "minor_only", that returns the next minor number but doesn't bother processing TCP ports.
* Did more work on adding support for adding new disk drives to servers in anvil-manage-server-storage.
* Updated anvil-manage-storage-groups To check for / delete duplicate storage groups with the same name.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-26 23:55:19 -04:00
digimer
88cc76914d This is an attempt to fix issue #341. It replaces the search for SN IPs from Network->find_matches() to Network->find_access(). The later of which doesn't care about the interface the IP was found on.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-24 21:24:37 -04:00
digimer
e0316da88b * Got anvil-manage-server-storage working enough to grow existing disk's hard drive sizes, and to insert/eject optical disks.
* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-23 23:09:55 -04:00
digimer
376660a120 * Removed the EXTRA_DIST argument from tools/Makefile.am
* Added a sanity check that a valid optical device was passed to anvil-manage-server-storage

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 21:20:10 -04:00
digimer
7a32d219fc Removed the old watch_drbd tool and added the new anvil-watch-drbd to the Makefiles.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:47:24 -04:00
digimer
1d12fb32b4 * Completed the new anvil-watch-drbd which replaces watch_drbd.
* Updated Email->get_current_server() to always load mail server data from the database.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-22 20:43:46 -04:00
digimer
336699a0f2 Added logging to help debug a DRBD resource config issue related to finding matching SN IPs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-21 10:29:44 -04:00
Digimer
8f491e01ed
Merge branch 'main' into anvil-tools-dev 2023-06-20 20:00:10 -04:00
digimer
0aa72498db * This adds the new tool 'striker-check-machines' which simply walks through all known physical machines and checks to see if they're accessible and powered on.
* Updated Get->uptime() to work on remote targets.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-20 19:57:21 -04:00
digimer-bot
82114120be
Merge branch 'main' into build-login 2023-06-20 10:47:38 -04:00
Tsu-ba-me
8a8b2cbc4b fix(tools): identify line(s) with UUID in interactive/script anvil-access-module 2023-06-20 00:48:21 -04:00
Tsu-ba-me
fe9c4a758f docs(tools): explain the interactive/script function of anvil-access-database 2023-06-20 00:48:21 -04:00
Tsu-ba-me
b494f79ffe fix(tools): anvil-access-module: default interactive, handle non-existing on class object 2023-06-20 00:48:21 -04:00
Tsu-ba-me
d9bc73ec2d feat(tools): add script capability to anvil-access-module 2023-06-20 00:48:21 -04:00
digimer
c9e11fbbfc * Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config.
* Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-19 21:44:45 -04:00
digimer
156a0ca201 Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-16 11:43:49 -04:00
digimer
cc15eca6fb * Added anvil-watch-power to git.
* Added a check to cleanup size input to Convert->human_readable_to_bytes() when passed pre-processed strings.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:35:42 -04:00
digimer
47f7a35df3 The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation.
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 21:13:53 -04:00
digimer
38d088a998 * Added anvil-watch-power to the makefile.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-15 10:17:02 -04:00
digimer
b6a249d5e7 * Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1).
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-11 23:46:21 -04:00
digimer
5bb1c631cf * Updated anvil-delete-server to accept '--server' and '--force' to allow direct deletion of a server without interacting with the menu system.
This partially addresses issue #321.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 16:23:28 -04:00
digimer
bc3d04ad2e * Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state.
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-06-06 14:34:02 -04:00
digimer
284a2957d6 * Fixes issue #329; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference.
* Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations().
* Added a bit more logging for future debugging.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-08 15:03:29 -04:00
digimer
8f375c58a9 * Fixed a typo in anvil-daemon that prevented compiling.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 11:14:23 -04:00
digimer
110dceb55e * Added a check to make sure files were ready before provisioning a server.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 01:15:08 -04:00
digimer
c50a1936c0 * This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon.
* Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-04 00:05:56 -04:00
digimer
1bba56a5b1 Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-05-01 10:54:51 -04:00
digimer
9a58f4d1ff * This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-30 19:47:58 -04:00
digimer
895f1ec262 This fixes a race condition when multiple servers are provisioned at (nearly) the same time.
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-28 00:19:53 -04:00
digimer
c11be1ad1a Added a skip to ignore dot files when looking at new files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-19 12:36:05 -04:00
digimer
dc7b909bfc More logging to debug storage group race condition
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:14:59 -04:00
digimer
bd575c6a7d Bumped logging for storage group management.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 19:02:51 -04:00
digimer
0874ad571a Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 14:33:58 -04:00
digimer
8ba613952c Typo fix.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 12:32:52 -04:00
digimer
83a527f4fa * Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-18 11:18:42 -04:00
digimer
f086c1be39 Fixed a bug where the total RAM was shown instead of the free RAM.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 13:02:50 -04:00
digimer
fdf49c696f Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-14 12:23:21 -04:00
digimer
fb70836126 This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-12 22:26:15 -04:00
digimer
9bf0f50084 Added a check to see if the server's UUID exists and looping if not to prevent unitialized variable warnings.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-09 23:38:39 -04:00
digimer
1c274ba58d * Fixed a bug in anvil-delete-server that was preventing the complete deletion of a server if the DRBD resource had already been removed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-04-03 12:40:58 -04:00
digimer
ddc6965b60 * Fixed a bug where references to files on Anvil! nodes was broken in anvil-provision-server and anvil-manage-files.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 17:33:49 -04:00
digimer
efebd135eb * Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes.
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-30 12:50:44 -04:00
digimer
8ff40ec42c * Fixed a SQL query bug in Database->get_drbd_data().
* Got more work done on anvil-manage-server-storage; Now shows DRBD resource size, backing LV and size, and calculates/displayes metadata size.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-26 02:09:52 -04:00
digimer
040bc02e26 * This adds the new Database->get_drbd_data() that, like ->get_lvm_data, collates the DRBD data collected by scan-drbd into more readibly parsable data structure.
* Updated DRBD->parse_resource() to add references to a resource name and volume for a given backing disk.
* Comtinued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-24 19:45:47 -04:00
Fabio M. Di Nitto
b75512e540 virt-install should not --wait on VM to be provisioned
Resolves: https://github.com/ClusterLabs/anvil/issues/277

Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2023-03-24 01:27:15 -04:00
digimer
8e0e51544c * Continued work on anvil-manage-server-storage.
* Created the new Database->get_lvm_data to compile LVM data from scan-lvm
* Updated DRBD->parse_resource to call Database->get_lvm_data if needed, and to track backing devices to Storage Groups.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-22 22:57:26 -04:00
digimer
b144976853 This resolves Issue #310.
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-20 23:43:40 -04:00
digimer
fea10e5bb1 * Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell.
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-03-03 14:42:28 -05:00
digimer
147f31aeeb * Added a loop when calling 'anvil-change-password' in a loop as there appears to be an unknown condition where during setup, this is called but never actually runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 18:37:13 -05:00
digimer
ab3e8afe6e Fixed a bug in Storage->push_file() where file path wasn't updated from incoming to files, preventing the push to other hosts from working. Also fixed a minor issue where the file size was sometimes 0, making transfer calculations useless.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 13:21:29 -05:00
Digimer
d59034a488
Merge branch 'main' into anvil-tools-dev 2023-02-22 02:21:50 -05:00
digimer
254f7ef4e2 This should fix the tracking of what files belong where, using the new DR links system. It also should finish (though testing is still needed) the serial rsync issue.
* Created Database->track_files() as a dedicated method as trying to verify the existence of file_locations during Database->load_anvils() was fragile and prone to recursive loops.
* Updated Database->insert_or_update_file_locations() to take an anvil_uuid and recursively call for each host, to maintain compatibility with the old ways, and make it simpler to add an entry for both sub-nodes in an Anvil!.
* Created Storage->push_file() that takes a file and rsync's it to all other machines, or creates a job for the file to be pulled if the target can't be accessed.
* Updated anvil-manage-files and anvil-sync-shared to use the new Storage->push_files and Database->track_files methods.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-22 02:13:19 -05:00
digimer
645f54ab89 This commit has more changes than I would normally like, but it's all linked to changing file uploads to rsync serially.
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
  This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.

Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-14 02:29:40 -05:00
digimer
7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks.
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-03 22:05:34 -05:00
Tsu-ba-me
58d09cb08c fix(tools): make server screenshot write to named pipe non-blocking 2023-02-02 23:18:26 -05:00
Tsu-ba-me
eb561d6d39 fix(tools): always send to pipe when given request host 2023-02-02 19:21:56 -05:00
Tsu-ba-me
3802c72912 fix(tools): check server state before getting screenshot 2023-02-02 18:25:33 -05:00
Tsu-ba-me
a9cc123300 fix(tools): exit at end of anvil-get-server-screenshot 2023-02-02 17:11:56 -05:00
digimer
9751c883cb * Updated Cluster->assemble_storage_groups() to remove refrences to anvil_dr1_host_uuid. Also added the logic for auto-adding DR host's VGs to a storage group. Commented it out though as, for now, this might be a bad idea. Needs more thought.
* Fixed a bug in Database->get_storage_group_data() to load hosts data when needed. Also fixed a bug where new members didn't return the new storage_group_member_uuid.
* Updated anvil-manage-host to use the new switch handler.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-01 23:19:38 -05:00
digimer
7773e5f9b8 * Updated logging in DRBD->get_devices().
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-30 11:30:36 -05:00
digimer
56cf100b09 * Added a check to ensure a storage group actually exists before trying to present it to the user. This should resolve issue #299.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-22 11:08:55 -05:00
digimer
695b274d78 * Fixed a bug in anvil-provision-server wasn't loading the available OS list when provisioning servers. The should resolve issue #296.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 23:47:04 -05:00
digimer
053e5312e1 * Fixed a bug in anvil-manage-dr where protect jobs with multiple potential targets wouldn't know which to use during job runs.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:51:21 -05:00
digimer
e9f390b65b * Udated RPM spec to add new core requires and add calling 'anvil-version-changes' to core's %post.
* Added missing man pages and the new anvil-manage-storage-groups to the Makefile.am's.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:24:24 -05:00
digimer
e012d6016c Tha major point of this commit is to add the new 'anvil-manage-storage-groups' program that, well, manages storage groups.
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 22:10:15 -05:00
digimer
355e5c2c0a * More work done on anvil-manage-dr. It now properly validated a dr host.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-20 00:11:35 -05:00
digimer
f8743a7435 * Further work on anvil-manage-dr. Now properly sanity checks that a valid server is passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-01-19 22:14:17 -05:00
digimer
1a217d21cf * Updated anvil-manage-dr to provide the ability to link anvil nodes to dr hosts. Also began work on making it work with the new DR links system.
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-19 19:41:02 -05:00
digimer
16fc4e131c * Fixed a bug where, if a specific request to do a DB resync was made but the active_uuid wasn't matching the host, it wouldn't resync. This broke peering Strikers when the peer source was not the active_uuid.
* Updated anvil-manage-dr to check and delete duplicate dr_link entries.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 22:53:15 -05:00
digimer
985338a064 Fixed typo that broke compilation.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 21:15:09 -05:00
digimer
98c3868870 * Updated fence_pacemaker to no longer use stonith_admin and instead use pcs. This should resolve the main part of issue #279
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-18 00:22:03 -05:00
digimer
0318b4bbe9 * Fixed (the very incomplete) anvil-manage-firewall so that it would clear a job, if a job was assigned to it.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-17 22:52:57 -05:00
digimer
ff69916a85 * Applied typo fixed from PR #286 (thanks, Deezzir!). Also moved all the raw prints into words.xml.
* Updated Convert->human_readable_to_bytes() to return an empty string if passed an empty string.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-16 20:23:29 -05:00
digimer
64bb5ab8e1 * Updated striker to only complain about unconfigured networks on nodes, not DR hosts.
* Updated anvil-configure-host to ignore gracefully unconfigured networks.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:41:55 -05:00
digimer
b8b4352117 * Added support for Migration Network configs in old striker and anvil-configure-host
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-15 01:24:26 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
digimer
dfa93a1837 * Added 'setsid' to all 'virsh' calls as nested calls (ie: crm_resource -> ocf:alteeve:server -> virsh) would fail because virsh couldn't connect to a terminal. See:
** https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
* Added explicity setting of $ENV{PATH} when it's null (as it is when pacemaker calls our tools).
* Updated the copyright to 2023.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-12 21:52:26 -05:00
digimer
192cee090b * Removed an unused code block.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:02:32 -05:00
digimer
b666caec64 * Updated anvil-provision-server to handle startup when the peer doesn't create/connect it's DRBD resource (ie: node is offline).
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-06 03:00:38 -05:00
digimer
a5cee52153 * Fixed a bug in DRBD->get_devices() where old test host UUIDs were left hard-coded.
* Fixed a duplicate header in words.xml
* Fixed display bugs in anvil-report-usage and removed the old DR host display info.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 22:58:28 -05:00
digimer
65a483273e * Updated anvil-version-changes to connect to the database with 'sensitive' so that the connection is unlikely to fail if schema changes are needed for normal operation.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-04 11:54:23 -05:00
Digimer
4d5dd8c6fa * Finished adding support for manually selecting a network with --network in anvil-provision-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-26 12:48:25 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
9194eb3d09 * Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost).
* Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-15 19:28:00 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
561fa1a9ec
Merge branch 'main' into anvil-tools-dev 2022-12-13 01:21:40 -05:00
Digimer
33b4516dea Fix a variable quoting bug in Database->locking().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-07 18:52:51 -05:00
Digimer
4fa8d7a446 * This completes the rework of DRBD triggered fencing to use / clear location constraints instead of triggering a power fence.
* Added the new unfence_pacemaker DRBD unfence handler.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-30 16:13:38 -05:00
Digimer
4ba1982183 This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence.
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.

Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.

Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-29 22:17:12 -05:00
Tsu-ba-me
9d418b276a build(tools): remove renamed striker-access-database script from Makefile 2022-11-29 14:39:40 -05:00
Tsu-ba-me
44acfe3e28 docs(tools): add in-script documentation to anvil-access-module 2022-11-28 20:09:41 -05:00
Tsu-ba-me
2e5edfdcf0 fix(tools): return complete subroutine results in anvil-access-module 2022-11-28 14:37:21 -05:00
Tsu-ba-me
0284434815 fix(tools): allow subroutine execution before reading $anvil->data 2022-11-28 14:37:19 -05:00
Tsu-ba-me
e988dcedde fix(tools): expose $anvil->data given specific target structure 2022-11-28 14:37:19 -05:00
Tsu-ba-me
809a7e2951 fix(tools): add anvil-access-module switch to output $anvil->data 2022-11-28 14:37:19 -05:00
Tsu-ba-me
a7b80b2e36 fix(tools): parse switches in anvil-configure-host 2022-11-28 14:37:19 -05:00
Tsu-ba-me
bb02d556d4 fix(tools): add output file id switch to anvil-get-server-screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
e5fc75f306 fix(tools): fetch and send server screenshot from node to striker that made the request 2022-11-28 14:37:18 -05:00
Tsu-ba-me
e14b1fc93e fix(tools): use absolute paths in anvil-get-server-screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
4b03be4bc3 fix(tools): restrict get server screenshot output to stdout 2022-11-28 14:37:18 -05:00
Tsu-ba-me
2c1f400222 fix(tools): avoid using undef resize args when getting server screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
7b14433588 fix(tools): always convert server screenshot to PNG 2022-11-28 14:37:18 -05:00
Tsu-ba-me
374f88acb7 fix(tools): use --quiet when getting server screenshot 2022-11-28 14:37:18 -05:00
Tsu-ba-me
a7a2cc70d7 fix(tools): striker-access-database->anvil->access->module; execute any sub on any module 2022-11-28 14:37:17 -05:00
Digimer
6eb99a2168 * FInished the anvil-manage-alerts tool. It can now send test alerts at a user-requested alert level.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 01:10:53 -05:00
Digimer
8b7a44cf75 * Finished cleaning up the output of Machines.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-22 00:19:00 -05:00
Digimer
3e53c87a6b Formatted the output of anvil-manage-alerts data (not yet machines) to be more presentable.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 23:28:50 -05:00
Digimer
622fb84652 * Renamed the 'notifications' table to 'alert-override', better reflecting what it does.
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 00:34:52 -05:00
Digimer
586ce6e5b9 * Got recipints working in anvil-manage-alerts().
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-15 22:17:12 -05:00
Digimer
35cf0c37fb * Updated System->check_ram_use() to set the maximum RAM based on the host type, and set those values in _set_default() so that the user can override if they want.
* Got anvil-manage-alerts to the point where you can add, edit and delete mail servers.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-14 17:17:30 -05:00
Digimer
1fba964a24
Merge branch 'main' into install-striker-access-db 2022-10-28 22:36:41 -04:00
Digimer
a6cd5c6604 * Starting work in the new anvil-manage-alerts, which will (when done), allow for management of mail servers, alert recipients, notification over-rides and to trigger test alerts.
* Updated Database->get_recipients() to record recipients by name for better sorting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-28 20:00:53 -04:00
Tsu-ba-me
0e2f119fef build(tools): add striker-access-db to tools/Makefile.am 2022-10-24 16:32:49 -04:00
Digimer
bde0b2e7ec * Fixed a bug where deleting ports from a fence device in an Install Manifest would not cause the fence methods to be removed from the associated cluster.
* Created Get->anvil_from_switch and Get->server_from_switch() (both need testing) that takes a string that could be either a name or UUID, figures out which it is, finds the entry in the DB and started the X_uuid and X_name switch variables.
* Started work on a second attempt at anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-20 22:33:41 -04:00
Digimer
93427a7a38 * Updated Get->switches() to always support job-uuid.
* Updated striker-initialize-host to support calls from command line switches, and wrote the man page for it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 19:16:32 -04:00
Digimer
c23c79cdf0 Added 'system::all::configured' to anvil-join-anvil to mark an explicit end of config.
Started updating striker-initialize-host to handle the new anvil repo config.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-18 10:56:58 -04:00
Digimer
596855405f * Added variables to record when pacemaker and DRBD are configured.
* Added verify-alg to DRBD configs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-17 21:57:00 -04:00
Digimer
13b0f5bdcc Bumped 'Exhaust Temp' jump threshold to 30c in scan-ipmitool.
Adjusted some logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-08 20:34:09 -04:00
Digimer
03f0cdad84 Updated anvil-manage-files to only remove files from /mnt/shared/files
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-05 20:56:57 -04:00
Digimer
a4ef93404c * Fixed a bug in DRBD->gather_data() to remove trailing commas for existing TCP ports.
* Added the missing 'clear-mapping' switch to Get->switches in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-05 20:15:32 -04:00
Digimer
3b721b849c * Fixed a bug in anvil-configure-host where if the same MAC address was assigned to two interfaces, it would cause an endless reboot loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-28 19:20:23 -04:00
Digimer
ac8135709a Fixed a bug where scan-server faulted with a divide by zero error when the host had no swap.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-27 00:40:30 -04:00
Digimer
599373816f * Fixed bugs that came up in testing. Was now able to setup long-throw DR!
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-22 16:40:40 -04:00
Digimer
2fab7bc1b7 This adds support (testing needed) for "Long-Throw" DR; which is a wrapper for using 'drbd-proxy' to provide larger transmit buffers so slow/high-latency DR hosts.
* Created DRBD->check_proxy_license() to do (some level of) sanity checks on the DRBD proxy license file.
* Updated DRBD->gather_data() to parse out the inside and outside ports for resource configs using proxy.
* Reworked DRBD->get_next_resource() to return 1, 3 or 7 TCP ports depending, with the new long_throw_ports parameter triggering the 7 ports.
* Added 'tcpdump' to the anvil-core requires list.
* Reworked scan-drbd to record the ports used in proxy configs. This required adding a check to change the 'scan_drbd_peer_tcp_port' column type to 'text' to support CSVs.
* Reworked anvil-manage-dr (needs testing!) to support "long-throw" DR configs.
* Updated anvil-safe-stop to check if the nodes are in the cluster before trying to migrate.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-21 23:35:06 -04:00
Digimer
c8ee75420d * Updated anvil-manage-dr to check if a server is protected before processing a --connect or --disconnect request. Also made it smarter if an attempt to connect a resource fails.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-09-01 16:09:37 -04:00
Digimer
e90dae96f7 * In Server->shutdown_virsh(), disabled trying to resume a paused VM. Also updated the logging around not waiting for a VM to stop.
* Updated anvil-safe-stop to check for VMs running, even if the cluster is stopped, when --stop-servers is used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-31 18:12:07 -04:00
Digimer
99a6593fe6 * Fixed a bug when connecting to databases when one DB has no variable entries, making it seem like a DB was disabled.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-25 21:43:21 -04:00
Digimer
9675ebf986 * Added --remove support to anvil-manage-dr, completing all the features for this tool.
* Updated DRBD.pm to move the logic to wipe and delete an LV into a new method called 'remove_backing_lv'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-24 22:08:48 -04:00
Digimer
93e6a59841 * Added 'vnc-server' to the list of firewall services enabled on strikers.
* Created the anvil-manage-dr man page.
* Reworked anvil-manage-dr's --protect logic to search for which network works with the DR host, instead of assuming it's the SN.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-22 13:38:46 -04:00
Digimer
29a28ee97a * Fixed a bug with anvil-provision-server where running the command line menu from a Striker would not assign the job to the target Anvil!.
* Updated Server->parse_definition() to check if a failed 'virsh list' output was passed in. Also changed it to not exit if the XML can't be parsed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-16 19:01:36 -04:00
Digimer
cbb441759e * Fixed a couple bugs in anvil-manage-files where a file moved from incoming to files or definitions wasn't having the directory updated properly in the database. Also made an explicit check when looking for missing files to check to see if the file exists in another managed directory and, if so and if a striker, update the DB.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-15 23:27:40 -04:00
Digimer
7b1771e498 Updated anvil-provision-server to wait until the local machine is a full cluster member before proceeding.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-15 13:59:35 -04:00
Digimer
4ecc6097d3 * Cleaned up some old 'die' calls with better nice_exit() calls to help avoid dangling db_in_use flags.
* Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge.
* Added 'test' messages to Words->string().
* Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-12 16:32:20 -04:00
Digimer
ef3ac86162 * Fixed a bug where setting the db_in_use flag without a valid $ENV{_}.
* Added a nice_exit call to tools/striker-access-database

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 15:45:10 -04:00
Digimer
21738ab0d4 Added a bit more logging to the Database->mark_active method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 01:02:59 -04:00
Digimer
a81478f2bc * Updated 'db_in_use' state to add the caller's name to the state name. This is pulled out when logging stale locks that are being reaped, to help debug where stale locks are coming from.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:29:03 -04:00
Digimer
e7cf8ac789 * Got more work done on anvil-manage-files. It now picks up new files on nodes/dr hosts in an Anvil! and downloads them if needed.
* Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:08:19 -04:00
Digimer
be84a23924 * There were still references in anvil-manage-files to 'file_locations' -> 'file_location_host_uuid'. Had to rework some logic to get things working. More testing needed, but so far at least the "missing file" function is working again.
* Added missing always-available switchs in Get->switches
* Create Storage->_wait_if_changing() to check to see if a file's size is changing and, if so, not return until it stops.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-08 21:31:56 -04:00
Digimer
15aadc3a4e * Updated scan-network to check for inactive or activating interfaces and manually bring them up, if the uptime is less than 10 minutes.
* Fixed a bug in scancore-agents/Makefile.am where scan-network was missing.
* Started work on anvil-delete-server.8. Incomplete at this time.
* Updated Network->get_ips() to record the interface status.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-03 23:38:56 -04:00
Digimer
55dd28e7f1 * Added the anvil-configure-host man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 22:38:04 -04:00
Digimer
7eff8f0801 * Added the man page for anvil-check-memory
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 20:26:54 -04:00
Digimer
5fea8ff46a * Adds the anvil-boot-server man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 19:09:57 -04:00
Digimer
d8f31d9d84 * Added the anvil-boot-server man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 17:25:28 -04:00
Digimer
b3b185a43c * Added the alteeve-repo-setup man page and updated it to show that when called with '-h'.
* Updated scancore to use the new Get->switches() list parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 14:31:46 -04:00
Digimer
d9910fc951 Finished the man page for anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-29 17:43:53 -04:00
Digimer
be612ff878 * Updated Get->switches() to take 'list' and 'man' parameters. With list, the passed in switches can be checked to ensure they're valid. With 'man', if set to the name of a man page (usually $THIS_FILE) will be displayed if --help, -h or -? are used.
* Disabled striker-parse-oui until it can be reworked to store the the OUI data in a flat file instead of in the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-29 16:56:40 -04:00
Digimer
cd220e97dc Disabled striker-prep-databas and set Database->configure_pgsql() calls to use debug => 2.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-20 20:32:18 -04:00
Digimer
508e278359 Added the new 'anvil-network-profiler' tool.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-19 21:16:46 -04:00
Digimer
7fd6185445 * Disabled firewalling for now. There appears to be an issue starting up with DRBD.
* Updated Convert->time() to return whatever was passed in instead of '#!error!#'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-09 19:46:38 -04:00
Digimer
171ea74000 * There is a fix in this commit to resolve a race condition where, when reconfiguring the network, the request to set a job to reboot would fail because the connections to all Strikers could be lost, causing Database->_test_access() would error out, blocking the reboot. When restarted, the network would not be changed, so no reboot would be requested, leaving the machine in an innaccesible state.
* Updated anvil-boot-server when called with '--all' to honour boot ordering, delays and condtions.
* Updated Database->get_servers() to collect the server's XML as well as data from the 'servers' table.
* Updated anvil-provision-server to make a new DRBD resource 'secondary' after forcing it to primary to begin the initial sync.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-06 19:22:28 -04:00
Digimer
bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work.
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-02 17:06:04 -04:00
Digimer
b2ea4f9adc * Moved System->manage_firewall() to Network->manage_firewall(). Started working on actually implementing it, which involves basically fully rewritting it.
* Updated tools/Makefile.am and scancore-agents/Makefile.am to add missing files.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-30 00:01:50 -04:00
Digimer
f2d06fa9b1 * Updated striker-parse-oui to only run if/when the system has been running for at least one hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-23 21:21:38 -04:00
Digimer
ab9b00a2f7 * Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons.
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-21 22:25:07 -04:00
Digimer
b154ec816a * Added network_interfaces, bonds, bridges and ip_addresses tables to the age-out list.
* Confirmed that striker-purge-target works again.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-20 21:21:44 -04:00
Digimer
b77bb81343 * Found a bug where, if a record was deleted from the public schema but not from the history schema, and then later a resync was performed, the record would be added to the peer database's public schema (while still not existing locally). This condition should never occur as data in history should only exist to track the public record. This update checks for this condition and purges those records prior to resync'ng a database table.
* Continued work on fixing issues with striker-purge-target (which led the the discovery of the above bug). Added expliit checks to purge file_location and storage_group data when purging an sub-anvil from the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-20 20:52:07 -04:00
Digimer
3caf43ed42 Updated striker-purge-target to check for problems on write of DELETEs.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-18 10:57:50 -04:00
Digimer
cdc23ad490 Small logging fix to striker-auto-initialize-all.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-17 16:43:50 -04:00
Digimer
6c5f48e8ca * Fixed a bug (I think) where initial synchronization was failing because the new locking system tried to register a lock against the peer striker before the peer striker was in the DB.
* Added an 'eval' wrapper around 'Database->write()' where it calls the given DB so that failures log properly instead of crash the program.
* Updated Database->_find_column() to no longer restrict to 'not null' calumn types.
* Fixed a couple typos in Database->read_state().

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-17 13:41:26 -04:00
Digimer
911f7cfb6a This is another big commit with a lot of DB work. Getting closer to sorting out the frequent resyncs.
* Changes Database->connect to always use the first DB connected to, not the local one if that applies. This treats the first DB (sorted by UUID) as "primary" and the second (or third...) as more of a backup.
* Moved db_in_use and lock_request to use the 'states' table instead of the variables table. These are set and removed so often that it was messing up things with resync's when the data is transient anyway. Fixed multiple bugs with both to better set and clear properly.
* Created Database->read_state() to assist with the above changes.
* Updated Database->refresh_timestamp() to specifically check that the returned time stamp differs from the previously used one, looping until they differ if needed.
* Disabled striker-manage-install-target when called to update the repos, as the Install Target function doesn't work at this point.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-16 20:10:43 -04:00
Digimer
24f5d39dff This is a set of changes all stemming from trying to debug frequent resyncs. More bugs still to be fixed.
* Updated Database->get_host_from_uuid() to cache results.
* Fixed a bug in Database->get_storage_group_data where a DELETE wasn't deleting from the history schema as well.
* In Database->resync_databases(), references to the old 'host_uuid' that we used to use to resync just the local host's data was removed. Added also a check where two or more entries in a given history schema had the same modified_date and, when found, the newest entry is preserved and the rest are deleted. Before this, a resync where two+ records had the same modified_time would only sync the last record, leaving a mismatch in history schema entries triggering repeated resyncs.
* Fixed a bug in Email->send_alerts() where the 'alerts' table was being updated without a modified_date being set.
* Fixed a bug in System->test_ipmi() where the 'hosts' table was being updated without a modified_date being set.
* Updated scan-network to clear up old deleted ip_addresses, bonds and bridges. Also fixed bugs where public schema records were being deleted without history records being deleted.
* Updated anvil-update-states to fix bugs where DELETEs were happening without setting the modified_date.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-12 23:14:49 -04:00
Digimer
1770e9e0e0 * Fixed a bug where Database resync's where trying to resync tables without history schema entries.
* Updated fence_delay to move the log filehandle close to a saner spot.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-08 21:27:41 -04:00
Digimer
e6dcff1cf1 * Added a missing modified_date to ip_addresses in Database->get_ip_addresses().
* Updated scan-network to purge old historical ip_addresses when clearing duplicates now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-21 15:52:25 -04:00
Digimer
1b70b49cf8 * Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in.
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-20 10:28:21 -04:00
Digimer
572167d034 * Updated Database->get_storage_group_data() to record the VG name for a given host's VG in a given storage group.
* Updated anvil-provision-server to fix more bugs related to --ci-test.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-10 18:00:37 -04:00
Digimer
d26a16e711 * Updated anvil-provision-server to handle human-readable sizes for disk and ram.
* Updated Database->get_anvils() to make it possible to translate a file name to a file UUID.
* Updated System->test_ipmi() to quote passwords properly. Also dropped the timeouts to 2 seconds.
* Updated anvil-provision-server to support pure CLI switch server provisioning using the --ci-test (and optional --options {--machine}) to allow CI tests.
* Continued work of anvil-manage-server.
* Fixed a bug in striker-prep-database to fix a bug in writing the pg_hba.conf file.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-10 00:42:40 -04:00
Digimer
929544bb90 Removed the '--' gateway holder to make it more consistent with the rest of the output.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-26 20:29:37 -04:00
Digimer
7ec4cee143 Created the new anvil-show-local-ips that shows the IPs on the host in an easier to read format, compared to 'ip addr list'.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-26 20:24:15 -04:00
Digimer
e9a9e0dd4b * Finished (but needs more testing) the new 'anvil-report-usage' tool.
* Updated System->_check_anvil_conf() to create the 'admin' user in a more normal way (old way caused the 'admin' group to be a system GID.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-25 23:56:58 -04:00
Digimer
d2973e603b Updated anvil-update-states to make the speed of links to 10000 when they are virtio interfaces.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-13 10:55:35 -04:00
Digimer
1dbca79dde * Created Network->get_ip_from_mac() which takes a MAC address and returns an IP address.
* Updated ocf:alteeve:server to always try to bring up the peer's DRBD resource, even when the local resource is up.
* Fixed a bug in scan-network where purging duplicate bridges failed in some cases.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-08 23:09:34 -04:00
Digimer
142be7674e * Fixed a bug in striker-scan-network where the scan wasn't running properly when no network was specifically given.
* Updated DRBD->get_devices() to store information about the nodes for each resource.
* Got more work done on anvil-report-usage.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-08 18:54:55 -04:00
Digimer
4751c6e747 Updated DRBD->get_devices() and Server->parse_definition() to take 'anvil_uuid' so that server data can be parsed from anywhere.
Created, but not finished, tools/anvil-report-usage that will print a report of server resource allocation and Anvil! resource availability.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-06 00:16:32 -04:00
Digimer
ce4d6cdcf0 Updated striker-parse-os-list to now take '--all' and '--xml' to show the list of OSes available to optmize VMs for in a simple machine-parsable format or XML, and to show only the OSes not in the words file yet, or all OSes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-01 17:34:34 -04:00
Digimer
0b41029db2 Reworked Database->_find_behind_databases to loop through tables, then databases when evaluating for resync. This is still racy but should be less racy as the time between counts of columns for a given table should be a lot shorter. Also re-enabled triggering resyncs based on the age of the most recent record.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-31 21:19:32 -04:00
Digimer
7212ea1c2f Fixed a bug where reaping db_in_use states wasn't restricted to the caller's host_uuid.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
aa7d9bdf14 * Fixed a bug where resync'ing the database was missing tables.
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
74b7719cf5 * Created the new anvil-manage-host that can check/set if a host is configured. On Strikers, it can age out data, resync data, and check/set if the local database is active.
* Updated striker-prep-database to again enable the postgresql service.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Tsu-ba-me
c268d345fc fix(tools): allow striker-access-database to process and execute DB subs 2022-03-18 22:50:40 -04:00
Tsu-ba-me
3e8baf4b3e chore: add dev tool to generate dispatch table from perl modules 2022-03-18 22:50:40 -04:00
Tsu-ba-me
e909a715a3 feat(tools): add cli tool for accessing database with SQL query 2022-03-18 22:50:40 -04:00
Digimer
8fbf594002 Updated striker-prep-database to stop -> start postgres post-configure, and to connect -> disconnect to run the schema load logic.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-16 13:59:21 -04:00
Digimer
edf51adaec * Changed 'anvil-manage-power' to no longer set the job progress to 50 prior to calling a reboot. It now sets to 100 immediately. Also reduced the uptime timer to five minutes from ten.
* Updated striker-auto-initialize-all() to reconnect to DBs during waits to better detect when a DB is marked as offline.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-16 00:35:26 -04:00
Digimer
7b090e1623 * Updated Database->shutdown() to disconnect, stop the postgresql daemon, then reconnect.
* Updated anvil-daemon to not stop a database until both/all DB hosts are in both/all DB's hosts table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-15 22:33:42 -04:00
Fabio M. Di Nitto
0ebe589c93 Ship striker-db-status
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2022-03-15 05:50:32 +01:00
Digimer
513ce3b74e Created 'striker-db-status' that reports the status of the databases to external tools. It's basic, but it works.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
3fd0db15bf * This rather heavily reworks how database shutdowns works. It adds much more intelligent shutdown, tracking who is using the database, being able to mark a database as "offline" and waiting for users of the database to disconnect before it shuts down.
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
b234b79544 Updated anvil-daemon to check if anvil-sync-shared is running if the reported RAM use is too high. If so, it doesn't exit. This fixes an issue where anvil-sync-shared would loop forever as it would constantly be killed when downloading large files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-27 21:29:30 -05:00
Digimer
68b1d12545 Updated anvil-daemon to not shutdown a striker DB until the striker host has been running for at least an hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-24 22:25:37 -05:00
Digimer
763821a21d Fixed a variable substitution bug.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-24 22:25:37 -05:00
Digimer
87a2454a09 Moved anvil-configure-host reboot logging to use log_0687 to help grep for reboot causes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-21 19:18:28 -05:00
Digimer
dc989f0950 Added more logging to track when and how reboots happen in systems.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-16 21:55:33 -05:00
Digimer
f77f486775 Fixed a typo in scan-network
Fixed a missing 'next' to prevent the first DB from disconnecting when down'ing excess DBs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-09 15:52:21 -05:00
Digimer
d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits.
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-05 22:08:06 -05:00
Digimer
a633ab7f63 Added a periodic check to ensure all users can ping. This fixes a bug where a local striker dashboard whose DB was stopped wouldn't work.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-21 14:27:58 -05:00
Digimer
e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-20 14:13:27 -05:00
Digimer
892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same.
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-18 02:38:50 -05:00
Digimer
796814531e Fixed a bug in Alert->check_condition_age() where, when the 'clear' parameter was set and the value was already 'clear', it would flip to 'set' erroniously.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-13 21:07:25 -05:00
Digimer
0a9f81d852 * Created the new striker-db-report that shows how many records are in each database table, and if passed a table name, report how many times each record has been recorded in the history schema.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-13 20:00:37 -05:00
Digimer
652f87ec74 * Updated scan-network to also clean up the media type.
* Updated anvil-daemon to check for files in /mnt/shared/incoming on striker dashboards and add them to the media library if needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-12 23:27:44 -05:00
Digimer
72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces.
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-03 17:06:56 -05:00
Digimer
3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system.
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-07 20:03:39 -05:00
Digimer
9cfd7b9b94 Created the new (and still in development) striker-file-manager to manage files from a Striker dashboard's command line. So far. it will add files only.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-01 18:43:50 -05:00
Digimer
65dfc22a38 Added an eval{} call around Database->query()'s ->prepare() DBI call to better handle lost database handle.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-30 17:58:29 -05:00
Digimer
521633f3eb
Merge branch 'master' into anvil-tools-dev 2021-11-25 01:50:20 -05:00
Digimer
75a4c8d709 * Moved the logic to add the local database to a Striker's anvil.conf from striker-prep-database to Database->_add_to_local_config().
* Updated striker-prep-database to always set the user's password, independent of whether the database user was created.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-25 01:47:55 -05:00
Fabio M. Di Nitto
36bcaa587f [build] fix FASEXECPREFIX handling and ship fence_ in -core rpm
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-11-24 21:58:16 +01:00
Digimer
958267e38f * Enabled scancore in the .spec file. Disabled calling striker-prep-database and anvil-update-state in the same.
* Updated striker-prep-database to check / wait until postgresql-server is installed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 01:22:33 -05:00
Digimer
034c38fdeb Disabled calling striker-prep-database from the spec file, and enabled scancore.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 00:55:02 -05:00
Digimer
6225ce1943 Updated striker-prep-database to not configure the firewall if firewalld isn't running.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 23:32:07 -05:00
Digimer
8e41814ca2 * Updated anvil-daemon->prep_database() to start the postgresql daemon if it's not running and no databases are available.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 21:49:24 -05:00
Digimer
b517117bc1 * Did more work on trying to figure out why iniital setup of the database was failing. I believe it was because, in anvil-daemon, after calling 'prep_database' we called ->connect() _without_ 'check_if_configured' set. Next round of function testing should help confirm is this was the case.
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 20:41:29 -05:00
Digimer
090c59a873 Updated striker-prep-database to enable extra logging to help diagnose a function test build failure problem.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-16 03:21:33 -05:00
Digimer
257a998743 * Updated Database->configure_pgsql() to use 'postgresql-setup --initdb --unit postgresql' instead of the deprecaded 'initdb' switch.
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-28 12:07:36 -04:00
Digimer
32d47f70f1 * Fixed bugs around ScanCore->check_power() so that it now returns time on batteries and highest charge are returned properly.
* Created Network->is_our_interface() which returns '1' if an interface is one managed by an Anvil!. Also updated scan-network to use this to determine when an interface alert should be a warning or notice level alert.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-11 21:57:35 -04:00
Digimer
226d1de6b5 Updated anvil-update-states to use the permanent MAC addresses, as done in scan-network. Updated Network->get_ips() to do the same.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-11 02:57:45 -04:00
Digimer
3445d008d2 Removed a stray debug die.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:31:23 -04:00
Digimer
63c45430bb * Updated scan-network to clear duplicate IP addresses.
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:27:54 -04:00
Digimer
47c832bc0e * Updated Network->get_ips() to check for 'permaddr' when processing 'ip addr list' to ensure the partmanent MAC is used.
* Updated scan-filesystems to set swap usage alerts to notice level only.
* Updated scan-network to pull the permanent MAC address from an 'ethtool -P <iface>' call to deal with the fact that wireless interfaces don't have their real MAC in the sysfs address file.
* Updated anvil-provision-server to set the rtc_tickpolicy to catchup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-07 17:10:25 -04:00
Digimer
8e436ffec7 WIP: Started work on a new Storage->copy_device() method that will do 'dd' calls.
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 17:58:20 -04:00
Digimer
0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds.
Removed the stub Server->provision method that was never used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 10:01:03 -04:00
Digimer
ea368a942b Finished the '--update' switch function in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 23:51:41 -04:00
Digimer
5ee7b2ccaf Got the '--connect' and '--disconnect' functions working in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 15:14:46 -04:00
Digimer
a034583213 * Updated DRBD->gather_data() to record TCP/IP data between connections of two hosts.
* Updated anvil-manage-dr to use the TCP ports already configured for a resource when re-configuring a DR resource that has been previously configured.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 22:34:36 -04:00
Digimer
0fcde483be
Merge branch 'master' into anvil-tools-dev 2021-09-20 11:44:30 -04:00
Digimer
fbe9adc306 Resolve a words key conflict.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:42:47 -04:00
Digimer
5c07179aa6 * Resolved a words.xml conflict.
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:32:45 -04:00
Digimer
e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-19 14:06:18 -04:00
Digimer
4e9882812d * Fixed a bug where the periodic database dumps on the primary database Striker were not sync'ing to peers. Also fixed a bug where these periodic dumps weren't running at all.
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 23:18:06 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Madison Kelly
922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time.
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2021-09-16 23:10:55 -07:00
Digimer
6664c5b77f * Fixed a bug where scan-drbd, with DR configured, was not recording TCP ports assigned to connections properly.
* More bugs fixed in anvil-manage-dr, tested repeatedly as a job and so far, so good. Other functionality still to come.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 23:34:25 -04:00
Digimer
da9dc03d04 Updated anvil-manage-dr to update the job progress and convert prints into strings.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 15:14:03 -04:00
Digimer
ffd15406e0 * anvil-manage-dr can now protect a server! Still lots to do though.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-11 19:42:55 -04:00
Digimer
20a784baa2 * Continuing work on anvil-manage-dr. Got it to the point where it should (but doesn't yet) create the new DRBD config and the LV(s) on DR.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-11 16:41:23 -04:00
Digimer
5b35204af4 * Updated DRBD->get_next_resource() to take the new 'dr_tcp_ports' ports which, if set, returns two free TCP ports.
* Got anvil-manage-dr to the point where it writes the updated resource configuration to enable DR support. (untexted)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-09 23:07:03 -04:00
Digimer
9edf698c37 Updated Database->get_storage_group_data() to determine when a node or DR host needs to be removed from a Storage group, or when a member of an Anvil! needs to be added to a storage group.
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-08 00:50:45 -04:00
Tsu-ba-me
f61527edf7 fix(tools): save screenshots to states table 2021-09-03 16:21:55 -04:00
Tsu-ba-me
c1859bc8d8 fix(tools): use netpbm tools instead of imagemagick 2021-09-03 16:21:55 -04:00
Tsu-ba-me
65613f501b fix(tools): add option to resize server screenshot 2021-09-03 16:21:55 -04:00
Tsu-ba-me
7467036054 build(tools): add anvil-get-server-screenshot script to build 2021-09-03 16:21:55 -04:00
Tsu-ba-me
da6b4d39c6 fix(tools): disable line wrap in image Base64 output 2021-09-03 16:21:55 -04:00
Tsu-ba-me
4ef231b567 fix(tools): prevent too frequent inserts of server VM screenshots 2021-09-03 16:21:55 -04:00
Tsu-ba-me
1014299d38 fix(tools): enable anvil-get-server-screenshot to be a job 2021-09-03 16:21:55 -04:00
Tsu-ba-me
f97a820b48 feat(tools): add script to take screenshot of server VM 2021-09-03 16:21:55 -04:00
Digimer
2f8b1fb72e Updated anvil-provision-server so that when the OS type is 'win7', set the disk to sata and the NIC to e1000e. Also updated it to store the virt-install call in the 'variables' table and write it out to /mnt/shared/provision.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-31 00:32:15 -04:00
Digimer
4427fe9f0d * Found the source of the vnet constantly cycling back to 'up' bug. The anvil-update-state tool was marking the vnet device operational state back to 'unknown' and scan-network was marking it back up.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-30 17:59:39 -04:00
Digimer
e40d0e2444 Fixed a bug where if a database is pingable but the pgsql database is down, and it's the first database tested (or local), then the DB handle used to read / quote fails.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-26 23:26:03 -04:00
Digimer
4c7bb45ab9 Fixed a race condition where configuring the IPMI BMC would appear to fail because the BMC wouldn't report the user list after a cold reset.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-25 21:02:00 -04:00
Digimer
6cbdc388d4 Fixed a bug where corosync's configuration of a backup ring was broken.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-24 15:52:44 -04:00
Digimer
04cb116c1b Updated anvil-parse-fence-agents to validate each fence agent's metadata is valid before adding it to the unified XML.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-19 00:58:26 -04:00
Digimer
8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf.
* Moved the check for an agent being disabled into ScanCore->agent_startup()

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-18 23:07:15 -04:00
Digimer
3674a47179 WIP - Working a tool to manually load updated server definition files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-16 11:39:16 -04:00
Digimer
aec22bb79c Added a check in scan-network that finds/removes duplicate network interface names.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-11 12:17:01 -04:00
Digimer
4800f7181f * Updated ScanCore to boot a node that is off without a stop reason.
* Fixed a bug where anvil-safe-stop was not recording the stop-reason. Also made '--poweroff' an alias for '--power-off'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-07 14:01:14 -04:00
Digimer
acaacd9a86 * Created Storage->get_size_of_block_device() that takes a block device path and returns the size of the path, if it's found in the database.
* More work on the storage management of anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-06 22:46:02 -04:00
Digimer
7a504467ef
Merge branch 'master' into anvil-tools-dev 2021-08-06 14:51:44 -04:00
Digimer
606bd8f1f0 Continuing work on anvil-manage-server.
Created Storage->get_storage_group_from_path() that takes a block device path and tried to find the Storage Group it belongs to.
Updated Storage->get_storage_group_data() to make it possible to look up a storage group UUID using the SG's name.
Updated DRBD->gather_data() to take a pre-generated XML via the new 'xml' parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-06 14:21:11 -04:00
Tsu-ba-me
063840ecb6 fix(tools): correct message_* string keys in striker-manage-vnc-pipes 2021-08-04 13:53:44 -04:00
Tsu-ba-me
8da318c933 fix(tools): patch failure to fix 2nd pipe after server migration 2021-08-04 13:40:02 -04:00
Tsu-ba-me
0f1c3d2435 chore(tools): remove unused function from striker-manage-vnc-pipes 2021-08-04 13:40:02 -04:00
Tsu-ba-me
cdb66019d3 fix(tools): avoid port conflict 2021-08-04 13:40:02 -04:00
Tsu-ba-me
7e447000b4 fix(cgi-bin): use unspecified instead of loopback address in SSH tunnel 2021-08-04 13:40:02 -04:00
Tsu-ba-me
b3b6da8259 chore(cgi-bin): remove debug log level from manage_vnc_pipes and its support scripts 2021-08-04 13:40:02 -04:00
Tsu-ba-me
549758b2f2 build(tools): include support scripts for manager_vnc_pipes endpoint into makefile 2021-08-04 13:40:02 -04:00
Tsu-ba-me
e50bfc7308 fix(tools): correct typo in passing server_uuid to get_vnc_info() 2021-08-04 13:40:02 -04:00
Tsu-ba-me
3a8f4c339b fix(tools): use VNC port in variables table if available 2021-08-04 13:40:02 -04:00
Tsu-ba-me
e4436be17b fix(tools): do checks and kills as root 2021-08-04 13:40:02 -04:00
Tsu-ba-me
bb155a5786 fix(tools): update job progress in catch-all case 2021-08-04 13:40:00 -04:00
Tsu-ba-me
ffc1fb096a fix(tools): correct switch name typo in striker-manage-vnc-pipes 2021-08-04 13:38:28 -04:00
Tsu-ba-me
1fec288ad0 fix(tools): make striker-manage-vnc-pipes executable 2021-08-04 13:38:28 -04:00
Tsu-ba-me
7d9013a60b fix(tools): allow striker-manage-vnc-pipes to be executed as a job 2021-08-04 13:38:26 -04:00
Tsu-ba-me
0935b9a990 feat(tools): move manage_vnc_pipes endpoint core logic to separate script 2021-08-04 13:34:58 -04:00
Tsu-ba-me
5459e610aa fix(tools): auto-end tunnel script when connection breaks 2021-08-04 13:34:58 -04:00
Tsu-ba-me
d5724c1457 chore(tools): rename striker-start-ssh-tunnel->striker-open-ssh-tunnel 2021-08-04 13:34:58 -04:00