* Cleaned up a log of logging to reduce the amount of log entries when running at log level 1.
* Bumped the scan-ipmitool default 'jump' range to 10c.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added checks to anvil-provision-server to see if an existing server name is flagged as DELETED, instead of outright rejecting a given server name.
Signed-off-by: Digimer <digimer@alteeve.ca>
Note: These changes below shouldn't have been in this branch... *sigh*
* Fixed an issue with tools/anvil-provision-server where a VM would be created but didn't boot. When this happens, an explicit boot is sent via virsh. Also bumped up the time it waits for a new server to start up.
* Added an explicit call to scan-drbd after a new resource is created to ensure that if any calls come after looking for the next free DRBD minor or port, they don't use the ones just used.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in anvil-provision-server where forcing initialization of a new DRBD resource when running on node 2 would fail because the node ID in the drbdsetup command was hard-coded to be run from node 1.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Cluster->shutdown_server() where the wrong variable was being evaluated when checking the server state.
* Created DRBD->delete_resource() that deletes a resource's backing device and configuration. Note that this wipes the DRBD MD and and FS signatures before removing the LV. Updated DRBD->gather_data() to record the backing devices for volumes.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created a new tools/striker-parse-os-list tool that parses 'osinfo-query os' and prints out entries for words.xml for any new OSes.
Signed-off-by: Digimer <digimer@alteeve.ca>