seasharp/anvil - anvil - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
digimer	c9e11fbbfc	* Added checks to anvil-provision-server to fail out if either of the SN IPs are not found when generating a DRBD resource config. * Added logging to anvil-provision-server and anvil-daemon to try to find the cause of jobs being re-run after completing. May have fixed with a fix to job_progress updates going to 100 too early in some cases. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	895f1ec262	This fixes a race condition when multiple servers are provisioned at (nearly) the same time. * In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers. * In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general. * Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	bce9e2caaf	This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. * Added multiple new private methods to Network that help in managing the firewall. * Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible. * Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly. * Updated scan-server to check that the firewall is managed when a server state has changed. * Updated anvil-daemon to run Network->manage_firewall on startup. * Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.) * Removed firewall management from striker-prep-database. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	3a6902d899	* Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed). * Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use. Signed-off-by: Digimer <digimer@alteeve.ca>	4 years ago
Digimer	711a04999e	* Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there. * Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit. Signed-off-by: Digimer <digimer@alteeve.ca>	4 years ago
Digimer	413a4f73c2	* Updated Tools->_anvil_version() and Get->anvil_version() to now pick up a SchemaVersion from anvil.sql. This will change only when the schema changes and is used when Database->connect() is checking compatibility with other anvil database hosts. This will make it only break connection when there is a reason to do so. The anvil_version still remains as an informational version that will help when supporting users later. * Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases. * Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name. * Updated Email->check_queue() to start and enable postfix.service if it's found to not be running. * Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet. * Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted. * Started work on anvil-migate-server, though it's far from complete. Signed-off-by: Digimer <digimer@alteeve.ca>	4 years ago