Commit Graph

814 Commits

Author SHA1 Message Date
Digimer
4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches.
* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-02 21:40:48 -04:00
digimer-bot
46c9035e4b
Merge pull request #83 from ClusterLabs/webui_anvil_page
* Created tools/striker-boot-machine to, well, boot machines. It uses…
2021-05-01 20:16:05 -04:00
Digimer
416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node.
* Created Cluster->get_fence_methods() that parses all fence methods out of a recorded CIB and stores the in a hash for a given host_uuid.
* Fixed a bug in ScanCore->post_scan_analysis_striker() where the short_host_name was not being stored correctly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-01 19:49:27 -04:00
digimer-bot
159c4a1612
Merge pull request #82 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-04-30 23:09:18 -04:00
Digimer
35e926c52b
Merge branch 'master' into anvil-tools-dev 2021-04-30 23:02:49 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Fabio M. Di Nitto
cc41424bb8
Merge pull request #81 from ClusterLabs/drbd
Update to kmod-drbd91
2021-04-28 06:20:24 +02:00
Fabio M. Di Nitto
2214866156 Update to kmod-drbd91
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-04-28 06:17:15 +02:00
Digimer
15dab8aab7 * Started working on the node post-scan login in ScanCore. Created ScanCore->check_temperature() to get a thermal score against a node.
* Update ScanCore->check_power() to not require the parameter values.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-26 21:42:57 -04:00
digimer-bot
51bf505d5f
Merge pull request #80 from ClusterLabs/anvil-tools-dev
* anvil-safe-stop is complete! Testing still needed, of course.
2021-04-23 18:12:56 -04:00
Digimer
f202187c34 * anvil-safe-stop is complete! Testing still needed, of course.
* Updated DRBD->manage_resource() to call 'drbdadm adjust <res>' when starting a resource to help deal with a periodic issue where the 'allow-two-primary' option on the peer doesn't match the local setting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 11:56:11 -04:00
digimer-bot
edf749ae78
Merge pull request #79 from ClusterLabs/anvil-tools-dev
* Made good progress on anvil-safe-stop. It will now stop or migrate …
2021-04-23 08:06:19 -04:00
Digimer
82929e28bf
Merge branch 'master' into anvil-tools-dev 2021-04-23 00:06:34 -04:00
Digimer
3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed).
* Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 00:04:20 -04:00
digimer-bot
4304fbe6dd
Merge pull request #78 from ClusterLabs/anvil-tools-dev
* Finished anvil-rename-server!
2021-04-22 13:34:31 -04:00
Digimer
27259d1d53 * Finished anvil-rename-server!
* Created Storage->delete_file() that, well, deletes files (locally or on a peer).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-22 13:29:50 -04:00
digimer-bot
2f17a0d402
Merge pull request #77 from ClusterLabs/anvil-tools-dev
* Updated DRBD->gather_data() to store data on peers so that the peer…
2021-04-20 23:25:53 -04:00
Digimer
53cd0bdf3a * Now with 100% less typos.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 23:22:45 -04:00
Digimer
e3ba64cb83 * Fixed a type in the Makefile.am.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 23:00:03 -04:00
Digimer
2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name.
* Added the scan-hpacucli scan agent. It's been done for a while and should have been added ages ago.
* Updated anvil-rename-server to get to the point where it will take down the DRBD resources on all machines, but waits if there is a sync under way. It also verifies that the server is off on all systems from virsh's perspective.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 22:46:51 -04:00
digimer-bot
591b550085
Merge pull request #76 from ClusterLabs/anvil-tools-dev
* Finished anvil-migrate-server and anvil-safe-start! Lots of testing…
2021-04-19 00:36:40 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
digimer-bot
a6a11abe01
Merge pull request #75 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-04-18 20:02:50 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
a480357049 * Fixed a bug in Cluster->assemble_storage_groups() where, if a group is created during an anvil-provision-server run, the group would get created multiple times.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-15 18:51:10 -04:00
digimer-bot
83b140511e
Merge pull request #74 from ClusterLabs/anvil-safe-start-work
Anvil safe start work
2021-04-15 02:43:51 -04:00
Digimer
b36093671b * Updated Database queries that were passing 'debug => $debug' to not do that, as it was causing far too much (useless) noise in the logs.
* Turned on print to console for logging in anvil-provision-server. Also updated it to check if the cluster is running and hold until it is.
* Cleaned up some code in Get->available_resources() that proved hard to debug.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-15 02:35:58 -04:00
Digimer
798518ba5e * While working on the boot/shutdown server tools, ran into and fixed a bug where files uploaded before an Anvil! was added could not have those files sync'ed. This was fixed though the new Database->check_file_locations() method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 22:56:18 -04:00
digimer-bot
ff5cefd1c2
Merge pull request #73 from ClusterLabs/anvil-safe-start-work
* Got anvil-safe-start to the point where is starts the cluster stack…
2021-04-14 00:35:20 -04:00
Digimer
426e16fbdf
Merge branch 'master' into anvil-safe-start-work 2021-04-14 00:32:40 -04:00
Digimer
e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added.
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 00:26:06 -04:00
digimer-bot
9ddd649383
Merge pull request #72 from ClusterLabs/anvil-safe-start-work
* Continued work on anvil-safe-start. Got it to the point where it de…
2021-04-12 20:52:53 -04:00
Digimer
faf1399440 * Continued work on anvil-safe-start. Got it to the point where it detects shared networks with its peer node and waits for all networks to be up.
* Fixed a bug in scan-drbd where the volume_uuid wasn't being stored in the proper hash, breaking insertions into scan_drbd_peers in some cases.
* Updated System->pids() to work with remote targets (will be used later to check for parallel runs of anvil-safe-start).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 20:46:30 -04:00
digimer-bot
c745a13991
Merge pull request #71 from ClusterLabs/anvil-safe-start-work
* Started work on anvil-safe-start. The enable/disable logic and how …
2021-04-12 00:37:38 -04:00
Digimer
15e71768a1 * Started work on anvil-safe-start. The enable/disable logic and how it runs automatically is controlled by the database and the tool can be used to control anvil-safe-start on both the local and peer node. It will be started by ScanCore, if scancore starts within 10 minutes of the node booting. It will always be able to run manually.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 00:28:24 -04:00
digimer-bot
75343aadff
Merge pull request #70 from ClusterLabs/webui_anvil_page
* Finished the 'get_X' enpoints so far defined. Added get_servers and…
2021-04-11 16:32:03 -04:00
Digimer
942e0f66bf * Finished the 'get_X' enpoints so far defined. Added get_servers and completed get_status
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-11 16:26:42 -04:00
digimer-bot
9a7d9f235c
Merge pull request #69 from ClusterLabs/webui_anvil_page
* Fixed a typo that broke compiling anvil-daemon in the last commit. …
2021-04-10 01:32:07 -04:00
Digimer
5f0b7740e2 * Fixed a typo that broke compiling anvil-daemon in the last commit. Yay for CI/CD!
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-10 01:28:12 -04:00
digimer-bot
9aabf27fe6
Merge pull request #68 from ClusterLabs/webui_anvil_page
* THe get_cpu endpoint was completed.
2021-04-09 21:48:06 -04:00
Digimer
2384c44544
Merge branch 'master' into webui_anvil_page 2021-04-09 21:43:11 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
digimer-bot
b5aa81471c
Merge pull request #66 from ClusterLabs/webui_anvil_page
Webui anvil page
2021-04-02 22:26:41 -04:00
Digimer
c2fe3a2f0a * Finished (initial) get_shared_storage.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-02 22:22:07 -04:00
Digimer
fa3c861a97 * Started work again on get_shared_storage
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-02 18:31:54 -04:00
digimer-bot
59cee0185f
Merge pull request #65 from ClusterLabs/scancore-debugging
* Fixed a bug that caused striker-initialize-host to not compile / run.
2021-04-01 11:42:10 -04:00
Digimer
70aa6a7a5b
Merge branch 'master' into scancore-debugging 2021-04-01 11:39:00 -04:00
Digimer
cd87c0f521 * Fixed a bug that caused striker-initialize-host to not compile / run.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-01 11:35:44 -04:00
digimer-bot
f93a5eccc9
Merge pull request #64 from ClusterLabs/scancore-debugging
* Created Storage->manage_lvm_conf() that checks / updates lvm.conf t…
2021-04-01 00:04:29 -04:00
Digimer
70dc0598f2 * Created Storage->manage_lvm_conf() that checks / updates lvm.conf to add a filter to avoid seeing DRBD devices as LVM components. This is now called from striker-initialize-host and scan-drbd.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-31 23:59:19 -04:00