anvil

Commit Graph

Author	SHA1	Message	Date
Digimer	58371d22b6	Merge pull request #335 from ClusterLabs/anvil-tools-dev Anvil tools dev	2 years ago
digimer	156a0ca201	Updated anvil-daemon's new job launching logic to allow the restart of a running job that failed out early. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	cc15eca6fb	* Added anvil-watch-power to git. * Added a check to cleanup size input to Convert->human_readable_to_bytes() when passed pre-processed strings. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	47f7a35df3	The main purpose of this commit is to add serial execution of similar jobs to help reduce race conditions for scripted jobs, like multiple server creation. * Fixed a small logging bug in DRBD->allow_two_primaries(). * Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded. * Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	38d088a998	* Added anvil-watch-power to the makefile. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	dda0fbd7d5	* Updated DRBD->allow_two_primaries() to be more careful at evaluating peer-node-id. * Updated DRBD->manage_resource() to set allow-two-primaries=no when up'ing a resource (as no migration can be in progress during an up command). * Updated scan-drbd to look for StandAlone resources and call DRBD->manage_resource({task = 'up'}) if a connection to a peer node is StandAlone or if the local disk state is detached. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	b6a249d5e7	* Updated Cluster->add_server() to set the preferred host based first on if the server is running on a node, and if not, on the primary node (where before it defaulted to node 1). * Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources. * Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	929806cef7	Fixed variable substitution names in scan-server. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	b03587967b	* Updated Cluster->add_server() to batch the creation of the server and the location constraints in one commit to the CIB. * Updated scan-lvm to look for and delete duplicate entries. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	133dbb121b	Merge pull request #334 from ClusterLabs/anvil-tools-dev Updated scan-cluster to check to see that migrate_to and migrate_from…	2 years ago
digimer	b7abc481e6	Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer-bot	ef84e63a7a	Merge pull request #333 from ClusterLabs/anvil-tools-dev Anvil tools dev	2 years ago
digimer	c82bd9d73a	* Created the new anvil-watch-power tool that shows the status of UPSes known on the system, including their "on battery" state, charge percentage, estimated hold up time, etc. * Updated Database->get_power() and ->get_upses() to store both the time stamp and unix time stamps. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	5bb1c631cf	* Updated anvil-delete-server to accept '--server' and '--force' to allow direct deletion of a server without interacting with the menu system. This partially addresses issue #321. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	bc3d04ad2e	* Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. * Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	b3f3c9b24e	Merge pull request #332 from ClusterLabs/anvil-tools-dev This commit addresses (hopefully) issue #329.	2 years ago
digimer	0e57836c8f	This commit addresses (hopefully) issue #329 . * Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now. * Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster. * Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Fabio M. Di Nitto	7cee742b67	Merge pull request #331 from ClusterLabs/digimer-patch-1 Delete notes	2 years ago
Digimer	711322f273	Delete notes	2 years ago
Digimer	91de6bb30e	Merge pull request #330 from ClusterLabs/anvil-tools-dev * Fixes issue #329; When multiple attributes exist when checking if w…	2 years ago
digimer	284a2957d6	* Fixes issue #329 ; When multiple attributes exist when checking if we're in maintenance mode in fence_pacemaker, the expected hash reference was actually an array reference. * Fixed a bug in anvil-version-changes where update_file_location_ready() needed to be called before update_file_locations(). * Added a bit more logging for future debugging. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	f3a65fc04d	Merge pull request #328 from ClusterLabs/anvil-tools-dev This should resolve issue #271.	2 years ago
digimer	8f375c58a9	* Fixed a typo in anvil-daemon that prevented compiling. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	110dceb55e	* Added a check to make sure files were ready before provisioning a server. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	c50a1936c0	* This adds the new 'file_locations' -> 'file_location_ready' column and associated methods. This is set to TRUE/1 when the file referenced is found on disk and it is the expected size and md5sum. This is meant to allow programs to wait/watch or a file to be ready if they need to use it. Files are now checked periodically via anvil-daemon. * Removed hard-coded log levels in anvil-provision-server and anvil-manage-storage-groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	dfc0c2c492	Merge pull request #326 from ClusterLabs/anvil-tools-dev * Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml…	2 years ago
digimer	26fa3c7e32	Fixed a bug where Get->available_resources() was missing LVM/storage group data in some cases. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	510db70253	Another attempt to resolve the stoage group race condition. This moves the check for auto-assembly to scan-lvm. It only works for the first assemble, after that the user can/should use anvil-manage-storage-groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	e483840ceb	Second attempt to fix the storage group race condition. This time, we only let node 1 assemble storage groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	d64044c7d1	Test fix for storage group race condition. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	1bba56a5b1	Hard coded anvil-provision-server to log level 2 while chasing a race condition is storage groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	9a58f4d1ff	* This is a small commit to increase logging while chasing down a race condition issue with assembling storage groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	895f1ec262	This fixes a race condition when multiple servers are provisioned at (nearly) the same time. * In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers. * In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general. * Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	e7537b0ca3	* Fixed a bug where, when DRBD->gather_data() calls 'drbdadm dump-xml' and the output includes usage data, it breaks XML parsing. * Fixed a bug in Get->available_resources() where DELETED servers were being counted in the used resources math. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer-bot	cc32d5b606	Merge pull request #320 from ClusterLabs/anvil-tools-dev Anvil tools dev	2 years ago
digimer	c11be1ad1a	Added a skip to ignore dot files when looking at new files. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	dc7b909bfc	More logging to debug storage group race condition Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	bd575c6a7d	Bumped logging for storage group management. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	0874ad571a	Updated anvil-safe-start to not give up on starting corosync/pacemaker if it fails on the first try. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	8ba613952c	Typo fix. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	83a527f4fa	* Removed enabling anvil-safe-start out of the RPM and into anvil-join-anvil. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	89eae7098e	NOTE: This updates the reserved RAM to 8 GiB from 4 GiB! * Adds support for 'anvil_resources:🐏:reserved' that can be set to a number of MiB to override the default 8192. * Adds support for 'anvil::<anvil_uuid>::resources:🐏:reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏:reserved' option. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	f086c1be39	Fixed a bug where the total RAM was shown instead of the free RAM. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	fdf49c696f	Updated anvil-report-usage to ignore deleted servers. Also added a check to ensure hosts are loaded if not. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	c956f75406	Enabled anvil-safe-start in '%post node'. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	025c2a6f54	* Updated Email->get_next_server() to ignore DELETED mail servers, and it now loads mail servers if not yet in memory. This resolves issue #306. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	fb70836126	This moves the call of anvil-safe-start out of scancore and into a new, dedicated systemd unit that runs on boot only. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Digimer	6bce292969	Merge pull request #319 from ClusterLabs/anvil-tools-dev Anvil tools dev	2 years ago
digimer	83aa4e6a5f	Updated scan-cluster to check for FAILED resources (servers) and, if found, attempt to recover it. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	1afa7ce09e	* Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state. * Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago

1 2 3 4 5 ...

2775 Commits (58371d22b6e4c5a2df023a5c7b25988728019f87) All Branches Search

2775 Commits (58371d22b6e4c5a2df023a5c7b25988728019f87)

All Branches