Commit Graph

31 Commits

Author SHA1 Message Date
Digimer
8807915bb7 The theme of this commit is database cleanup and fixes.
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-30 15:16:25 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
15fd0e5ce8 * Updated anvil-daemon (and Database->insert_or_update_jobs) to now recognize jobs with the job_status of 'scancore_startup' to run only when ScanCore starts.
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-04 01:41:33 -05:00
Digimer
4b9ec56106 * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-02 23:09:47 -05:00
Digimer
056f8edf48 * Moved the scan-drbd function 'gather_drbd_date' and moved it into DRBD->gather_date().
* Finished DRBD->get_next_resource() that returns the next available minor and the next free TCP port (with two free ports available after it).
* Created Storage->get_storage_group_details() that pulls together the LVM, storage group members and storage groups into one block of data.
* Made more progress on tools/anvil-provision-server. It now gets up to the point of creating LVs, creating DRBD resource files, loading them, creating metadata and up'ing the resource. It doesn't yet (successfully) force a new resource to primary.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-15 03:10:16 -05:00
Digimer
f30cce3c5a * Created the new tools/anvil-provision-server tool which will handle provisioning new servers, as well as having an interactive menu system to provision servers from the command line.
* Created Cluster->assemble_storage_groups() and moved the logic to auto-assemble groups out of Get->available_resources().
* Created Cluster->get_anvil_name() that will return an Anvil! name for a given anvil_uuid, or the name of the Anvil! if the host is a member of an Anvil!.
* Updated Cluster->get_anvil_uuid() to return the 'anvil_uuid' if passed a specific 'anvil_name'.
* Updated Jobs->clear() to use 'switches::job-uuid' when a job_uuid is not passed but the value exists in 'switches::job-uuid'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-30 04:35:39 -05:00
Digimer
fe7cdb18fb * Updated all methods to add (or fix) logging the method entry.
* More work done on Email->send_email() to, well, actually send email (which it isn't doing yet, but it's close).
* Updated Words->key() to include the bad key name when no entry for the requested key exists in the words.xml file.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-06 01:52:03 -04:00
Digimer
4f39272d9a * Fixed a big in Jobs->get_job_details() where jobs weren't being found via 'switches::job-uuid'.
* Enabled anvil-join-anvil debugging of stonith handling to later catch an 'uninitialized value' warning (despite seeming to complete configuration of the Anvil! successfully).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-04 01:14:22 -04:00
Digimer
de43ea3ac1 * Renamed all Validate->is_X to Validate->X. Also created Validate->ipv6() to validate IPv6 addresses using Data::Validate::IP (and added it as a requirement to the .spec base RPM).
* Added the fix from the last commit for System->call to handle returned data without an ending newline to Remote->call.
* Got more work done on System->update_hosts(). It's able to add new hosts, but misses the short and FQDN host names. Need to fix that and the verify existing / manual entries aren't molested.

Signed-off-by: digimer <digimer@pulsar.alteeve.com>
2020-07-07 01:18:38 -04:00
Digimer
453f5c6223 * Fixed a bug where $anvil->nice_exit() was being passed 'exit' instead of 'exit_code' as a parameter.
* Update striker manifest run to add an entry into the 'anvils' table, and pass the anvil_uuid to the jobs rather than the various host_uuid's.
* Fixed a bug in the 'anvils' SQL procedure that copied data into the history schema (a few columns were missing).
* Updated anvil-configure-host to reboot when finished to be certain network changes have taken effect. Also updated the handling of virsh bridges to delete the autostart symlinks if libvirtd daemon isn't running.
* Added some logic to anvil-daemon to call 'anvil-update-states' with the -v{1,3} flag depending on the active debug level.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-24 00:39:56 -04:00
Digimer
4489111a65 * Fixed a bug in Job->clear() where it was not doing it's one job right.
* Updated System->generate_state_json() where when the full host name was short, it wouldn't set the short host name properly.
* Fixed a bug in 'tools/anvil-manage-power' where the node wouldn't mark the reboot as complete. Resolves issue #11.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-11 16:08:06 -04:00
Digimer
530fb31478 * Updated Jobs->get_job_details() to use --job-uuid switch or, failing that, look for an incomplete on this host with the same command as the calling program.
* Got anvil-join-anvil to the point that it reworks the network configs, updates MTUs and configured NTP.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-03 21:52:13 -04:00
Digimer
9bcad1c4bf * Did more work on adapting tools/anvil-configure-host to work on nodes and DR hosts.
* Fixed a bug in Job->get_job_details() when no job_uuid was passed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-11 00:38:48 -05:00
Digimer
a54e0d4e22 * Made a fair bit more progress on tools/striker-manage-install-target. It now registers a system with Red Hat and attaches appropriate subscriptions, updates the OS and (tries) to install anvil-{node,dr}.
* Fixed a bug in Remote->call where a shell call that ended in a newline would work, but throw an error and not get the return code.
* Created Database->get_job_details() which takes a job_uuid and returns the job details, if found.
* Fixed a bug in Jobs->update_progress() where 'clear' wasn't removing the old job_progress data.
* Added the parameters 'no_files' to skip stat'ing/recording non-directories, and 'search_for' which will set the parent directory in 'scan::searched' and stop scanning if found. This allows this method to act as a directory tree scanner and as a search engine.
* Created Striker->get_local_repo() that builds a repo file body suitable for adding to peers, nodes and DR hosts.
* Fixed bugs in the Striker WebUI related to initializing a target node / DR host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-18 18:48:09 -04:00
Digimer
324ef351fe * Updated DRBD->get_devices() to properly identify the peer node, when run on an actual node in the cluster (not DR or Striker).
* Created System->active_lv() that, surprise, activates an inactive logical volume. Also created ->check_storage() that parses out the LVM data.
* Fixed a bug in tools/fence_pacemaker that was preventing it from compiling and running.
* Updated ocf:alteeve:server to validate the target server's storage.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-06 23:31:35 -04:00
Digimer
5d91211dff * Continued work on 'anvil-download-file', in-progress adding support for job processing.
* Updated Database->insert_or_update_jobs() to use job_data when looking for a job uuid. Also fixed logging and adapted 'jobs::X' variable feeding to prevent them being undefined. Also made the search find jobs with the program name anywhere in the string, instead of just the start of the strong.
* Started work on Jobs->update_progress() to handle updating downloading file information.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-04-04 12:27:27 -04:00
Digimer
27ba3dcbb9 * Created Database->read() to store and return the handle to whichever database is used for read operations. Also created Database->quote that uses ->read to access the DBI 'quote' method more cleanly. Updated all calls to use these new methods.
* anvil-manage-files now identifies peers on the same subnet(s) and stores them in a sortable hash.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-03-06 01:49:59 -05:00
Digimer
e3d11fd938 * Did some more work on anvil-manage-files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-15 01:50:38 -05:00
Digimer
0d62a28fda * Created Database->insert_or_update_file_locations().
* Fixed some bugs in Database->insert_or_update_files() and the 'files' table/procedure in the SQL schema.
* Got more work done on anvil-manage-files.
* Created Job->get_job_details().
* Added an executable check for files in Storage->scan_directory().
* Cleaned up some logging and switched to Job->get_job_details() in anvil-update-states and striker-configure-host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-14 04:25:52 -05:00
Digimer
d68b85fe9e * Working on a file upload with progress bar feature. Expect more changes as it is refined.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-08 03:21:51 -05:00
Digimer
9fd0aa3e0f * Finished getting the Jobs page working. Needs more testing, but seems to be working fine.
* Created Job->html_list() to return the list of running or recently completed jobs, and used it instead of the code formarly used in 'striker' when Striker was unavailable.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-01 01:39:11 -05:00
Digimer
3c980a5c6d * Created Job->running() to return '1' when one or more jobs are in progress on the host.
* Started work on a "Jobs" button on the Striker UI to be able to see the progress of jobs that are running in the background.
* Updated the Help icon and added the jobs (tasks) icons.
* Made logging around dhcpd more verbose to help figure out why it's auto-running after initial configuration.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-31 23:26:55 -05:00
Digimer
02c4fe1fa1 * Updated all perl module modes to remove the executable bit.
* Updated anvil.sql to add the new tables needed for alert mail delivery.
* Update anvil.sql and Database->initialize to now default the user to 'admin' and swap that out if needed, instead of using the #!variable!user!#' replacement variable.
* Started updating anvil.spec for EL8.
* Added support for 'striker::repo::extra-packages' which users can use to add additional packages to the Striker repositories.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-05 18:57:44 -07:00
Digimer
b5dc83d39c * Renamed anvil-configure-striker -> striker-configure-host and anvil-manage-install-target -> striker-manage-install-target as they're both Striker-specific.
* Fixed a bug in Words->parse_banged_string where some variable strings were not being cleared, causing infinite loops.
* Added job progress reporting in striker-manage-install-target, and made it only refresh the RPM repo when '--refresh' is specified (with --force now forcing the issue). This was done to allow adding it into anvil-daemon in such a way that it would only update the RPM repo once a day.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-06 04:14:58 -05:00
Digimer
6f0bc0d86f * Fixed a major bug where anvil-daemon would reset the job_progress to 0 when clearing the 'reboot_needed' flag, causing anvil-daemon to pick the job up and again reboot, repeatedly.
* Updated Jobs->update_progress to take 'picked_up_by' as an optional parameter, defaulting to '$$' (the caller's PID).
* Created System->get_uptime() to return the current uptime in seconds.
* Added a delay to anvil-manage-power to not proceed with a reboot if the uptime is less than 600 seconds. This way, if any future bug causes an infinite reboot, there will be more time to determine what's wrong and debug the system between reboots.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-06 03:16:08 -04:00
Digimer
f58979ad27 * Fixed a bug in Jobs->clear() where the 'job_uuid' wasn't being passed to Database->insert_or_update_jobs().
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-25 16:28:35 -04:00
Digimer
90bbcfe9f4 * Updated Database->insert_or_update_jobs() to have 'update_progress_only' conditionally update 'job_status', 'job_picked_up_by', 'job_picked_up_at' and 'job_data'.
* Updated Database methods that can take the caller's file or line to append the local file and line.
* Updated Jobs methods to call Database->insert_or_update_jobs().
* Updated the remaining UPDATEs of 'jobs' table to use 'Database->refresh_timestamp()' for the 'modified_date'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-25 13:50:48 -04:00
Digimer
9bd5dd9a18 Revert to bfc2204.
This reverts commit bfc2204352.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-25 02:05:07 -04:00
Digimer
a8369170b4 This is the start of a major change!
The resync of the databases was originally designed (on m2) with the expextation that any given column would have only one change per 'modified_date' time. That was never a great approach, but it worked in m2 and just bit me on m3. With job processing, for an example, the job_progress will change repeatedly in one pass, all with the same 'modified_date'. So only one record per run would resync. To fix this, the plan is to drop 'history_id' (and the procedure/trigger in pgsql to copy INSERT and UPDATEs to the history schema). The new plan is to use 'change_uuid' with a per-transaction UUID created in Database so that the per-DB 'history_id' is replaced with a per-update/insert UUID in 'change_uuid'. This will become the unique record used to sync databases, instead or 'modified_date'. To keep things consistent, 'modified_date' was renamed to 'change_date' to match 'change_uuid'. This work is very much "in progress" and not finished.

This commit also changes Get->uuid to use UUID::Tiny to create v4 UUIDs instead of making making a system call to 'uuidgen'. This sped up UUID generation by almost 100x.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-23 16:16:08 -04:00
Digimer
00565b123c * Updates Tools->nice_exit to add the caller name to the exit status.
* Created Job->clear() to clear the job_picked_up_by column. Created Job->get_job_uuid() to return the job_uuid of an unfinished job matching a given job_command string (if any found).
* Updated striker->process_power to log the user out after confirming a poweroff or reboot action.
* Added anvil-daemon --startup-only to not enter the main loop and exit.
* Finished getting poweroff and reboot working (though more testing needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-06 01:37:08 -04:00
Digimer
545f9a9bb5 * Renamed tools/anvil-reboot-needed to tools/anvil-manage-power and started adding support for rebooting and powering off to it.
* Created the Anvil::Tools::Jobs module to handle general job processing task. Moved 'update_progress' from tools/anvil-update-system to it and generalized it.
* Added some missing CDATA wrappers to the words XML file strings with '>' in it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-04 18:57:09 -04:00