Commit Graph

638 Commits

Author SHA1 Message Date
Digimer
983e3ad114 * Updated the os_type regex to detect CentOS Stream properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 15:28:52 -05:00
digimer-bot
d3d81f865d
Merge pull request #33 from ClusterLabs/anvil-daemon-debugging
Anvil daemon debugging
2021-02-08 15:27:21 -05:00
digimer-bot
5b06cf5570
Merge branch 'master' into anvil-daemon-debugging 2021-02-08 15:23:31 -05:00
digimer-bot
28060d5ecf
Merge pull request #32 from ClusterLabs/scancore-debugging
Scancore debugging
2021-02-08 15:23:17 -05:00
Digimer
06506ba5df * Removing (again) test.pl from Makefile.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 13:50:36 -05:00
Digimer
e8e042f0ae * Removed anvil-jobs from Makefile.am
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 13:42:56 -05:00
Digimer
1a520b03d5 * Cleaned up a lot of logging in anvil-daemon and tools it calls.
* Deleted anvil-jobs as it never ended up being used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 13:39:34 -05:00
Digimer
482e4f41c2 * Removed 'test.pl' from Makefile.in/.am
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 19:00:03 -05:00
Digimer
a1eede2757 * Added new jumps to scan-ipmitool to make it less likely to trigger a jump alert for 'Temp{1..4}' sensors.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:54:12 -05:00
Digimer
1ec03c9718 * Removing 'test.pl' from git.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:42:34 -05:00
digimer-bot
827a3f2ee4
Merge pull request #31 from ClusterLabs/scancore-debugging
Scancore debugging
2021-02-07 18:13:50 -05:00
Digimer
6009590352 * Fixed a bug in scan-apc-ups where changes in the transfer reason were not being recorded.
* Cleaned up a log of logging to reduce the amount of log entries when running at log level 1.
* Bumped the scan-ipmitool default 'jump' range to 10c.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:11:29 -05:00
Digimer
b2dab95459 * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:11:29 -05:00
Digimer
2be14d93a6 * Added a check to anvil-delete-server to remove the XML definition file.
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:11:29 -05:00
Digimer
ee6fcdde81
Merge branch 'master' into scancore-debugging 2021-02-05 23:49:20 -05:00
Digimer
569270541e * Added 'tar' as a dependency because somehow I went three years without this...
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:43:17 -05:00
Digimer
9dbb39da5b * Added support for manually setting the server's UUID in anvil-provision-server. Also, if a server name existed before but was deleted, the old UUID is re-used to provide better continuity. The user can override this behaviour with the new --uuid switch.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:43:17 -05:00
Digimer
0ec1bf6b6a * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:41:48 -05:00
Digimer
8d0f873912 * Updated scan-storcli to check if a MegaRAID controlled exists and neither storcli64 or perccli64 exist. If a controller is found but no RPM is installed, it checks to see if the host is Dell and then decides to try and install perccli or storcli.
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:34:51 -05:00
digimer-bot
66e0fb4490
Merge pull request #30 from ClusterLabs/string_bugs
* Fixed a bug where Words->load_agent_strings() wouldn't process stri…
2021-02-04 00:20:46 -05:00
Digimer
db3cf4f344
Merge branch 'master' into string_bugs 2021-02-03 21:51:20 -05:00
Digimer
50d529e07c * Added a check to anvil-delete-server to remove the XML definition file.
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-03 19:21:15 -05:00
Digimer
e052c75e2f * Added a check to anvil-delete-server to remove the XML definition file.
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-03 19:20:35 -05:00
Digimer
ac6531ddf2 * Fixed a bug where Words->load_agent_strings() wouldn't process strings without new-lines in them (caused in the last fix).
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-03 18:53:56 -05:00
digimer-bot
eef1e4fb20
Merge pull request #29 from ClusterLabs/dependencies
* Added 'tar' as a dependency because somehow I went three years with…
2021-02-03 14:49:05 -05:00
Digimer
1998ad946c * Added 'tar' as a dependency because somehow I went three years without this...
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-03 14:34:08 -05:00
Digimer
68d007c1bb
Merge pull request #28 from ClusterLabs/anvil-provision-server-fixes
Anvil provision server fixes
2021-02-03 13:03:31 -05:00
Digimer
ff3681c913 * Added support for manually setting the server's UUID in anvil-provision-server. Also, if a server name existed before but was deleted, the old UUID is re-used to provide better continuity. The user can override this behaviour with the new --uuid switch.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-03 12:58:18 -05:00
Digimer
4b9ec56106 * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-02 23:09:47 -05:00
Digimer
cb955f5370 Merge branch 'anvil-provision-server-fixes' of github.com:ClusterLabs/anvil into anvil-provision-server-fixes
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-01 13:11:32 -05:00
Digimer
3708575485 * Added a check to anvil-delete-server to remove the XML definition file.
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-01 12:56:46 -05:00
Digimer
42dc688d04
Merge pull request #27 from ClusterLabs/string_bugs
* Fixed a bug in Words->parse_banged_string() Where the flattened str…
2021-02-01 12:37:33 -05:00
Digimer
3f04c9031b * Fixed a bug in Words->parse_banged_string() Where the flattened string wasn't being used for the variable substitutions.
* This resolves issue #23.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-01 12:27:05 -05:00
Digimer
86228e9d1d * Added a check to anvil-delete-server to remove the XML definition file.
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-01 12:06:40 -05:00
digimer-bot
030f4a7efc
Merge pull request #26 from ClusterLabs/string_bugs
String bugs
2021-01-31 05:27:31 -05:00
digimer-bot
5a720276d4
Merge pull request #25 from ClusterLabs/anvil-provision-server-fixes
* Typo fixed in striker-manage-install-target insertion variable.
2021-01-31 05:23:29 -05:00
Digimer
9ae6a6b00f
Merge pull request #24 from ClusterLabs/storage-groups-testing
* Finished fixing automatic building of Storage Groups on systems whe…
2021-01-31 05:19:22 -05:00
Digimer
1081645893 * Added parameters to DRBD->get_next_resource to allow for a resource to be searched and either error out if a resource is found, or return the first DRBD minor and tcp port if found.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-31 02:32:12 -05:00
Digimer
f4bf1fd54a * Removed some XML insertions into strings as the break inserting into strings.
Note: These changes below shouldn't have been in this branch... *sigh*
* Fixed an issue with tools/anvil-provision-server where a VM would be created but didn't boot. When this happens, an explicit boot is sent via virsh. Also bumped up the time it waits for a new server to start up.
* Added an explicit call to scan-drbd after a new resource is created to ensure that if any calls come after looking for the next free DRBD minor or port, they don't use the ones just used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-31 01:39:52 -05:00
Digimer
3d7ce84c38 * Fixed a bug in Get->host_from_ip_address() where hosts that are no longer used are returned, meaning 2+ results could be returned after a node was replaced, meaning no host name was returned.
* Fixed a bug in anvil-provision-server where forcing initialization of a new DRBD resource when running on node 2 would fail because the node ID in the drbdsetup command was hard-coded to be run from node 1.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-31 00:00:23 -05:00
Digimer
218934bec8 * Fixed a bug with the path to anvil-provision-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 19:06:22 -05:00
Digimer
e25a424eb4 * Typo fixed in striker-manage-install-target insertion variable.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 18:49:15 -05:00
Digimer
864d67b0a7 * Finished fixing automatic building of Storage Groups on systems where VGs are deleted.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 18:25:24 -05:00
Digimer
4619c810d8
Merge pull request #20 from ClusterLabs/makefile
[build] first pass at adding a build system to integrate with CI
2021-01-30 14:33:50 -05:00
Fabio M. Di Nitto
8f9892650b [build] first pass at adding a build system to integrate with CI
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-01-30 20:16:30 +01:00
Digimer
413a4f73c2 * Updated Tools->_anvil_version() and Get->anvil_version() to now pick up a SchemaVersion from anvil.sql. This will change only when the schema changes and is used when Database->connect() is checking compatibility with other anvil database hosts. This will make it only break connection when there is a reason to do so. The anvil_version still remains as an informational version that will help when supporting users later.
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 14:03:13 -05:00
Digimer
89dec8e1f9 * Finished anvil-delete-server! (More testing needed though)
* Fixed a bug in Cluster->shutdown_server() where the wrong variable was being evaluated when checking the server state.
* Created DRBD->delete_resource() that deletes a resource's backing device and configuration. Note that this wipes the DRBD MD and and FS signatures before removing the LV. Updated DRBD->gather_data() to record the backing devices for volumes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-26 01:45:17 -05:00
Digimer
549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed).
* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-25 01:00:55 -05:00
Digimer
d9d347ce63 * Updated .spec for the new source location.
* Created a log disable flag to avoid deep recursion when logging at level 3.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-22 00:37:30 -05:00
Digimer
05b1fccdb3 * Created Cluster->add_server() which, well, adds a server to a pacemaker cluster, including sorting out location constraints to favour the node the server is running on, if it's running.
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-18 00:38:06 -05:00