anvil

Commit Graph

Author	SHA1	Message	Date
digimer	51978e1609	Update scan-server to only alert on large boot time changes Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	a11b87458e	Gracefully handle errors from changed node host names in scan-cluster. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	5ec395c53a	Reworked DB resync logic. With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time). * Database->check_agent_data() no longer calls a resync after loading a schema. * Removed the Database->coonnect() 'all' parameter * The database used to read from is now always the same as the primary, even if there is a local DB. * Database->connect() 'check_for_resync' parameter can now be set to '2', which means "check for resync _if_ I am primary", where '1' still checks for resync no matter what. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	b1f89c2723	Finished initial version of striker-show-jobs * Updated Database->get_jobs() to take 'job_host_uuid = all' to allow loading jobs from all cluster machines. Also updated it to record the 'job_host_uuid' and the unix timestamp version of 'modified_date'. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	829ae546a2	Beginning work on new Server->locate() method to find servers across an Anvil! cluster. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	122816255d	* Fixed a bug where a sensor value of '0' was being interpretted as the value not existing. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	580980717d	This commit covers the convertion of 'virsh' shell calls to using 'Sys::Virt' module, and fixes several small bugs related to scan-server; * Switched all calls to virsh to use Sys::Virt to deal with contention of simultaneous virsh calls. * Removed collecting screenshots from scan-server. * Fixed a bad variable substitution in an alert. * Fixed a bug where a server's boot time wasn't being recorded properly. * Reworked how we determine which server definition was most recently updated and propogated. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	a81a110261	* Remove forced log level and secure logging. This addresses issue #386 Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	a0cb791f47	This contains fixes needed for beta from additional testing. * Updated the pcs wrapper to flock anything but status calls. * Updated scan-apc-pdu to purge regardless of the host it's called on any host. * Fixed a bug striker-purge-target that wouldn't purge anvil nodes in various cases. Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	6ee2ad75db	* Updated anvil-delete-server to actively check for and delete any drbd-fenced attributes left over in the CIB after a server is deleted. This addresses issue #374 . Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	6a7c9923ad	* Fixed second variable replacement bug, re issue #338 . Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	9ebe192306	This fixes a variable substitution but, addressing issue #338 . Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
digimer	7258781712	* Updated scan-cluster to detect stale drbd-fenced attributes in the CIB, generally left after a server is deleted. This addresses issue #374 . Signed-off-by: digimer <mkelly@alteeve.ca>	1 year ago
Tsu-ba-me	dac247f66e	fix(scancore-agents): get screenshot of server(s) running on local node in scan-server	1 year ago
digimer	e0316da88b	* Got anvil-manage-server-storage working enough to grow existing disk's hard drive sizes, and to insert/eject optical disks. * Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	dda0fbd7d5	* Updated DRBD->allow_two_primaries() to be more careful at evaluating peer-node-id. * Updated DRBD->manage_resource() to set allow-two-primaries=no when up'ing a resource (as no migration can be in progress during an up command). * Updated scan-drbd to look for StandAlone resources and call DRBD->manage_resource({task = 'up'}) if a connection to a peer node is StandAlone or if the local disk state is detached. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	929806cef7	Fixed variable substitution names in scan-server. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	b03587967b	* Updated Cluster->add_server() to batch the creation of the server and the location constraints in one commit to the CIB. * Updated scan-lvm to look for and delete duplicate entries. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	b7abc481e6	Updated scan-cluster to check to see that migrate_to and migrate_from are given a timeout of 600s and an on-fail of "block". Updated Cluster->add_server() to set migrate_from to timeout=600s and on-fail=block as well. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	bc3d04ad2e	* Updated Cluster->add_server() to wait up to 15 seconds for a server to appear to ensure that the pcs call to add the server with the right requested running state. * Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	0e57836c8f	This commit addresses (hopefully) issue #329 . * Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now. * Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster. * Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	510db70253	Another attempt to resolve the stoage group race condition. This moves the check for auto-assembly to scan-lvm. It only works for the first assemble, after that the user can/should use anvil-manage-storage-groups. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	83aa4e6a5f	Updated scan-cluster to check for FAILED resources (servers) and, if found, attempt to recover it. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	1afa7ce09e	* Created Cluster->recover_server() that uses crm_resource to try to recover a server that has entered a FAILED state. * Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	c7a923fdfb	* Fixed a bug in scan-server where DELETED servers were being set to 'shut off'. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	bf2e3e25fb	* Added a check for undefined variable/value pairs in cachevault data that was causeing SQL UPDATE errors. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
Deezzir	7d5f18b20d	fix: introduced optional arg for clean_spaces	2 years ago
Deezzir	deac1fc6a8	fix: introduced optional arg for clean_spaces	2 years ago
digimer	efebd135eb	* Removed more references to 'dr1_host_uuid' from the old way of linking DR hosts to Anvil! nodes. * Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted. * Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host. * Updated anvil-version-change and anvil.sql to enable update_audits and the audits table. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	fea10e5bb1	* Prefixed all 'virsh' calls with 'setsid --wait' to help prevent future hangs if the call happens without a shell. * Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks! * Updated System->call to log parameters if 'shell_call' isn't set. * Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	7710d9d109	* Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks. * Created DRBD->parse_resource() to pass a specific DRBD resource's XML data. * Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs. * Fixed bugs in Get->server_from_switch() where it just wasn't working properly. * Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline. Signed-off-by: digimer <mkelly@alteeve.ca>	2 years ago
digimer	76c8088aee	* Updated scan-apc-pdu to only run on the active striker DB (as set during Database->connect()) to prevent contention from simultaneous scan agent runs from different machines. Signed-off-by: digimer <digimer@gravitar.alteeve.com>	2 years ago
digimer	0fa6ddebc5	Updated scan-network to see an interface state of 'activated' as up (used to check specifically for 'active'). Signed-off-by: digimer <digimer@gravitar.alteeve.com>	2 years ago
Digimer	eae2ab4d9f	* Undid the #!no_value!# -> !!no_value!! change as it broke language processing. * Fixed a bug in scan-apc-pdu that was preventing it from compiling. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	4528f07508	* Fixed a bug where fence-handler was repeatedly added by scan-drbd. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	4ba1982183	This is the start of a set of changes needed to rework how we handle DRBD fence requests, so that they create location constraints instead of triggering a full stonith fence. * In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing. * Updated scan-server to check for / add DRBD fence rules as needed. Scancore APC agent bugs; * For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents. * Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts. * Fixed missing variables not being passed to alerts/log entries. Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	13b0f5bdcc	Bumped 'Exhaust Temp' jump threshold to 30c in scan-ipmitool. Adjusted some logging. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	a4ef93404c	* Fixed a bug in DRBD->gather_data() to remove trailing commas for existing TCP ports. * Added the missing 'clear-mapping' switch to Get->switches in anvil-daemon. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	ac8135709a	Fixed a bug where scan-server faulted with a divide by zero error when the host had no swap. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	2fab7bc1b7	This adds support (testing needed) for "Long-Throw" DR; which is a wrapper for using 'drbd-proxy' to provide larger transmit buffers so slow/high-latency DR hosts. * Created DRBD->check_proxy_license() to do (some level of) sanity checks on the DRBD proxy license file. * Updated DRBD->gather_data() to parse out the inside and outside ports for resource configs using proxy. * Reworked DRBD->get_next_resource() to return 1, 3 or 7 TCP ports depending, with the new long_throw_ports parameter triggering the 7 ports. * Added 'tcpdump' to the anvil-core requires list. * Reworked scan-drbd to record the ports used in proxy configs. This required adding a check to change the 'scan_drbd_peer_tcp_port' column type to 'text' to support CSVs. * Reworked anvil-manage-dr (needs testing!) to support "long-throw" DR configs. * Updated anvil-safe-stop to check if the nodes are in the cluster before trying to migrate. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	89121a2b3b	* Fixed a bug in Alert->check_condition_age() where not setting a host_uuid caused the returned age to always be 0. * Updated scan_apc_pdu to not report a lost PDU unless it's been gone for ten minutes. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	4ecc6097d3	* Cleaned up some old 'die' calls with better nice_exit() calls to help avoid dangling db_in_use flags. * Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge. * Added 'test' messages to Words->string(). * Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	ef3ac86162	* Fixed a bug where setting the db_in_use flag without a valid $ENV{_}. * Added a nice_exit call to tools/striker-access-database Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Fabio M. Di Nitto	7decdb2887	scan-network: fix path to script Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>	2 years ago
Digimer	15aadc3a4e	* Updated scan-network to check for inactive or activating interfaces and manually bring them up, if the uptime is less than 10 minutes. * Fixed a bug in scancore-agents/Makefile.am where scan-network was missing. * Started work on anvil-delete-server.8. Incomplete at this time. * Updated Network->get_ips() to record the interface status. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	e025f5b927	Fixed line wraps Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	7fd6185445	* Disabled firewalling for now. There appears to be an issue starting up with DRBD. * Updated Convert->time() to return whatever was passed in instead of '#!error!#'. Signed-off-by: Digimer <digimer@alteeve.ca>	2 years ago
Digimer	bce9e2caaf	This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work. * Added multiple new private methods to Network that help in managing the firewall. * Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible. * Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly. * Updated scan-server to check that the firewall is managed when a server state has changed. * Updated anvil-daemon to run Network->manage_firewall on startup. * Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.) * Removed firewall management from striker-prep-database. Signed-off-by: Digimer <digimer@alteeve.ca>	3 years ago
Digimer	b2ea4f9adc	* Moved System->manage_firewall() to Network->manage_firewall(). Started working on actually implementing it, which involves basically fully rewritting it. * Updated tools/Makefile.am and scancore-agents/Makefile.am to add missing files. Signed-off-by: Digimer <digimer@alteeve.ca>	3 years ago
Digimer	ab9b00a2f7	* Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons. * Updated scan-drbd to purge peer records that no longer have corresponding LVM data. * Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect. Signed-off-by: Digimer <digimer@alteeve.ca>	3 years ago

1 2 3 4

172 Commits (51978e160931ffd804e5501fae074eac054ce434)