Commit Graph

39 Commits

Author SHA1 Message Date
Digimer
d7d418ee1b * Fixed a bug in DRBD->gather_data() where the peer node's data was being recorded where the local node's data should have been saved.
* Fixed a bug in anvil-delete-server where, if a server was off already, the server would not be removed from pacemaker.
* WIP - continuing on scan-network

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-30 14:58:36 -04:00
Digimer
607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration.
* Updated DRBD->allow_two_primaries() to take the 'set_to' parameter which can be 'yes' to all and 'no' to disallow dual-primary.
* Updated ocf:alteeve:server to call allow_two_primaries() with 'set_to' = 'no' instead of calling 'adjust' after a migration completes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-25 00:45:52 -04:00
Digimer
d155c2eb66 * Fixed a bug where 'timeout' would repeatedly get added to drbd's global-common.conf file.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 20:03:17 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
49890762b9 * Fixed a missing semi-colon that broke Database.pm.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 12:53:49 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Digimer
f202187c34 * anvil-safe-stop is complete! Testing still needed, of course.
* Updated DRBD->manage_resource() to call 'drbdadm adjust <res>' when starting a resource to help deal with a periodic issue where the 'allow-two-primary' option on the peer doesn't match the local setting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 11:56:11 -04:00
Digimer
2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name.
* Added the scan-hpacucli scan agent. It's been done for a while and should have been added ages ago.
* Updated anvil-rename-server to get to the point where it will take down the DRBD resources on all machines, but waits if there is a sync under way. It also verifies that the server is off on all systems from virsh's perspective.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 22:46:51 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
59b867cc25 * Updated DRBD->gather_data() to check if drbdadm exists before trying to call it to avoid scary errors in the logs. Also moved some strings that pulled from the scan-drbd agent into the main words file.
* Fixed a bug in ScanCore->agent_startup() where a (thankfully broken) check to append tables to the 'sys::database::check_tables' would cause an infinite loop as both were pointers to the same anonymous array.
* Fixed a bug in scan-ipmitool where the scan_ipmitool_variables table didn't use a host_uuid reference, causing resyncs of that table to sync for all hosts and cause DB errors when the scan_ipmitool record from another host wasn't sync'ed yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-31 01:36:11 -04:00
Digimer
4b9ec56106 * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-02 23:09:47 -05:00
Digimer
1081645893 * Added parameters to DRBD->get_next_resource to allow for a resource to be searched and either error out if a resource is found, or return the first DRBD minor and tcp port if found.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-31 02:32:12 -05:00
Fabio M. Di Nitto
8f9892650b [build] first pass at adding a build system to integrate with CI
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-01-30 20:16:30 +01:00
Digimer
89dec8e1f9 * Finished anvil-delete-server! (More testing needed though)
* Fixed a bug in Cluster->shutdown_server() where the wrong variable was being evaluated when checking the server state.
* Created DRBD->delete_resource() that deletes a resource's backing device and configuration. Note that this wipes the DRBD MD and and FS signatures before removing the LV. Updated DRBD->gather_data() to record the backing devices for volumes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-26 01:45:17 -05:00
Digimer
056f8edf48 * Moved the scan-drbd function 'gather_drbd_date' and moved it into DRBD->gather_date().
* Finished DRBD->get_next_resource() that returns the next available minor and the next free TCP port (with two free ports available after it).
* Created Storage->get_storage_group_details() that pulls together the LVM, storage group members and storage groups into one block of data.
* Made more progress on tools/anvil-provision-server. It now gets up to the point of creating LVs, creating DRBD resource files, loading them, creating metadata and up'ing the resource. It doesn't yet (successfully) force a new resource to primary.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-15 03:10:16 -05:00
Digimer
162f4913b1 * Started work on DRBD->get_next_resource(), that will eventually return the next free DRBD minor and TCP port numbers.
* Fixed a bug in scan-drbd that was still looking for the scan_drbd_resource_uuid from the resource config file. Also added a check to see if 'scan-drbd::resource_status' directory exists before trying to read the files in it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-13 01:31:44 -05:00
Digimer
d8fc660a8b * Finished scan-drbd. Removed the code to track resources by UUID via resource config file. The UUID tracking it fairly easy messed up by manual edits/rsyncs, and the code to handle updating the UUID in the resource config files was complicated enough that it negated any benefit of avoiding resources being marked as removed / newly created on name change. Left the DRBD->resource_uuid() method though, in case we decide to revisit this.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-06 18:11:26 -05:00
Digimer
20b75310b7 * Created DRBD->resource_uuid() which reads (or adds) the 'scan_drbd_uuid' to the resource configuration file, which will allow us to track resources when they're renamed.
* Got scan-drbd's read_last_scan() and process_drbd() functions done (but untested so far).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-01 23:59:54 -05:00
Digimer
0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names).
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-20 00:27:36 -04:00
Digimer
fe7cdb18fb * Updated all methods to add (or fix) logging the method entry.
* More work done on Email->send_email() to, well, actually send email (which it isn't doing yet, but it's close).
* Updated Words->key() to include the bad key name when no entry for the requested key exists in the words.xml file.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-06 01:52:03 -04:00
Digimer
767148b538 * Updated Database->get_mail_servers() to clear old stored data, and to pull out the list of when a mail server was last used.
* Got email server configuration under way. A mail server can now be configured via Email->_configure_for_server(), but more work is needed on when to switch between configs.
* Fixed some logging of passwords that wasn't being checked to see if secure logging was enabled or not.
* Fixed a bug in Striker where the back arrow in email config sub-sections weren't going back to the main email menu.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-27 02:09:21 -04:00
Digimer
1498e1b53c * Got server migration working using ocf:alteeve:server in a test environment!
* Converted most 'eval { }' calls to localize $@ and test the output of the eval, instead of checking to see if $@ was set.
* Converted all 'local' hash references to instead use the short host name of the local machine as a new standard.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-19 18:54:09 -04:00
Digimer
6eeb4e48c7 * Added 'timeout' and 'wfc-timeout' to drbd'd global-common.conf in DRBD->update_global_common().
* Fixed a minor bug in ocf:alteeve:server found in testing the RA.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-14 00:24:02 -04:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
39b4a912af * Remember in the last commit how I said that DRBD->update_global_common() was done? Well that was cute, 'cause it was quite broken. Now it's working.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-10 15:42:05 -04:00
Digimer
d647014ad1 * Created (finished but not yet tested) DRBD->update_global_common() to update DRBD's global_common.conf file.
* Cleaned up a lot of log levels.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-07 23:12:11 -04:00
Digimer
32bcdbe6d3 * Removed Network->is_remote, standardized on Network->is_local, and flipped calls to it to be more sensible (is_local -> local call -> else remote call). Also fixed a deep recursion issue with ->is_local where, given that it logs (which calls Storage methods which have local/remote invocations), would loop.
* Fixed a bug where '$target' being preset to 'local' was causing bad calls to 'Remote->call'.
* Updated Storage->change_mode and -> change_owner to work locally and on remote hosts.
* Barely started work on striker->process_anvil_menu().

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-19 00:57:33 -04:00
Digimer
3a86bed694 * Fixed tools/striker-initialize-host so that it set the hostname on the target, not locally.
* Updated System->host_name to work locally and on remote targets.
* Renamed all 'hostname' instances to 'host_name' to standardize on a spelling throughout the program.
* Removed use of and dependency on 'hostname'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-02 15:39:21 -04:00
Digimer
b9a0cc4d56 * Finished the initial tools/striker-initialize-host!
* Created Tools->refresh to reload anvil.conf in one call.
* Created Anvil::Tools::Network to hold network-related tasks.
** Created Network->is_remote() that tests to see if a string (containing a target) refers to the remote machine (versus a local machine). Updated all previous checks to use this new method.
** Moved Get->network_details() and Get->network() to the new Network module. Renamed Get->network() to Network->get_network().
** Made Network->get_ips() work locally and remotely.
** Created Network->find_matches() that compares two scanned machines IPs (via two previous calls to Network->get_ips())
* Created Database->manage_anvil_conf() that will add, update or remove a given database connection in a local or remote anvil.conf file.
* Fixed bugs in Storage->backup() where the bash calls were quite broken. I'm not sure how it ever worked before... x_x
* Updated anvil-daemon to not initialize a database unless it's running on dashboard. Also added a check at the startup of anvil-daemon where it will go into a loop waiting for a database to become available, re-reading anvil.conf each loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-22 23:36:59 -04:00
Digimer
b8816382b8 * Created Log->is_secure() to more cleanly handle conditional logging of strings with passwords or passwords directly. Updated log entries that could benefit from this method to use it.
* Cleaned up the striker->add_sync_peer() function to more clearly differentiate the ssh port from the pgsql port.
* Improved the HTML form to not have the browser treat host login fields as credentials to autofill or save.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-08 22:54:47 -04:00
Digimer
f5caec52dc * Made DRBD->allow_two_primaries() smarter about finding the 'target_node_id' when it wasn't passed.
* Fixed a couple bugs, and now ocf:alteeve:server properly can pull and push servers between nodes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-20 00:59:41 -04:00
Digimer
113a44ecc6 * Got 'migrate_to' working in ocf:alteeve:server. 'migrate_from' still needs work.
* Created DRBD->allow_two_primaries() and ->reload_defaults() that enables (and resets/disables) dual-primary operation (allow-two-primaries=yes), used to enable live migration.
* Created Remote->test_access() that simply verifies that a remote target can be accessed (as a given user).
* Created Server->migrate() that actually migrates a server. It can push already, and pull will be added next.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-16 01:41:47 -04:00
Digimer
d224be9344 * Created DRBD->manage_resource() that allows for up/down/primary/secondary'ing a resource on a local or remote system.
* Created Server->boot() that starts a server and verifies that it did actually start.
* Got ocf:anvil:server smarter about starting DRBD resources, properly handing resources where auto-promote isn't enabled. The 'start' process is now complete (baring bugs).

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-08 22:58:21 -04:00
Digimer
b1ddf945e2 * Got ocf:alteeve:server working again to boot servers. It's now smarter, knowing when the server is running locally already (success), running on the other node (hard error) and running on DR (fatal error).
* Updated DRBD->get_devices() to store data all under 'drbd::config::<host>::x'.
* Created Server->find() that takes a target and collects the servers running on it.
* Updated System->check_storage() to redirect all calls STDERR to /dev/null to supress errors about failing to open /dev/drbdX when LVM's filter isn't setup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-08 01:10:38 -04:00
Digimer
324ef351fe * Updated DRBD->get_devices() to properly identify the peer node, when run on an actual node in the cluster (not DR or Striker).
* Created System->active_lv() that, surprise, activates an inactive logical volume. Also created ->check_storage() that parses out the LVM data.
* Fixed a bug in tools/fence_pacemaker that was preventing it from compiling and running.
* Updated ocf:alteeve:server to validate the target server's storage.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-06 23:31:35 -04:00
Digimer
7a7e3db0c1 * Created DRBD->get_devices() that finds and maps the /dev/drbdX devices to their resources and backing LVs.
* Got Server->get_status() and ->_parse_definition() pulling out all but the device bus data from the XML files. Under devices, started parsing, with 'channel' devices now being parsed.
* Fixed a bug in several Storage->X calls where the test to see if a 'target' was the local host or not wasn't smart enough.
* Purged 'ocs:alteeve:server' to prepare for the rewrite where a lot of the logic is being moved into the Anvil::Tools modules as their functionality will be needed elsewhere anyway.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-30 02:10:04 -04:00
Digimer
312c949648 * Actually added the new Anvil::Tools::DRBD module, as well as a new ::Server module that will handle anything related to the virtual servers.
* Started a re-write of the ocf:alteeve:server resource agent. It has bugs and is missing logic, and trying to clean it up given most of the bugs come from trying to clean up the original agent doesn't make sense. Moving much of the logic into module methods as the functions will be needed elsewhere anyway.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-25 01:12:25 -04:00
Digimer
63a86b685a * Finished Anvil::Tools::DRBD->get_status().
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-24 00:43:20 -04:00
Digimer
0f873c45b5 * Created the new Anvil::Tools::DRBD moduke to hold all DRBD related stuff. Started working of ->get_status, still very much a work in progress.
* Started working on ocf:alteeve:server to make it smarter (and more patient) when bringing a DRBD resource up so we don't get false failures when we hit a race.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-23 02:37:14 -04:00