* Updated Cluster->parse_cib() to store DRBD fence node restrictions by
server/node. Also updated to make it easier to get the server's
preferred node.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated System->maintenance_mode() to take 'host_uuid' so that the
maintenance mode of remote machines can be checked/set.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Get->available_resources() to record the maximum cores that
can be allocated to a server. This is N-1 for hosts with 4 or less
cores, or N-2 cores otherwise.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->insert_or_update_servers() to error if the RAM being
recorded is less than 640 KiB. This is because, somewhere yet
undiscovered, the RAM is being recorded in KiB which breaks things.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_hosts() to store hosts in a host_type hash.
* Updated Database->get_servers() to store servers by name, regardless
of host Anvil! node.
Signed-off-by: digimer <mkelly@alteeve.ca>
This branch resolves issue #462; Auto growing PVs. Specifically, it looks at the LVM PVs on the host and checks to see if there is unused free space after the backing partition. If there is, it auto-grows the partition and then resizes the PV. This featu
re is designed to make life easier for users who deleted the auto-created '/home' partition during the anaconda disk partitioning tool.
* Created Storage->auto_grow_pv() that does the above.
* Added the missing hidden method name _create_rsync_wrapper in the Storage module POD.
* Added a call to Storage->auto_grow_pv() in anvil-configure-host and anvil-version-changes for nodes and DR.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->connect_to_libvirt() to check that the target URI's
SSH fingerprint is recorded before connecting. Also added an alarm
wrapper around the Sys::Virt->new() call.
* Continued work on anvil-manage-server-system, working on the boot
order section now.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->locate() to take the new 'anvil' parameter to speed up
searches.
* Updated Server->update_definition() to use Server->locate() to find
where updates are needed. It now also defines the server with the new
config.
Signed-off-by: digimer <mkelly@alteeve.ca>
* If the call to Remote-call() set the target that was actually the
local short hostname, it would fail to make the call at all. Now if
the 'target' is local, the shell call is instead passed to
System->call() instead.
* Cleaned up logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Get->host_name() to accept the new 'refresh' parameter. This
forces a reread of the hostname, instead of using the cached value.
* Updated System->host_name() so that, when it's updating the hostname,
it updates the database and cached variables.
* Updated Words->center_text() to avoid undefinied parameter issues.
* Updated anvil-join-anvil to ensure the 'sys::host_name' variable.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Striker->generate_manifest() to add pod and make the prefix,
sequence and domain parameters required.
* Created the check_for_broken_manifests() function for anvil-daemon to
detect/remove broken manifests.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a called to Database->_check_for_duplicates to Database->resync_databases
* Added 'check_for_resync => 1' to anvil-configure-host.
Signed-off-by: digimer <mkelly@alteeve.ca>
Moved the logic to a new private method, and call it now from the active
Striker in the once per minute loop. The duplicate variable issue seems
to be not entirely uncommon.
Signed-off-by: digimer <mkelly@alteeve.ca>
With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time).
* Database->check_agent_data() no longer calls a resync after loading a
schema.
* Removed the Database->coonnect() 'all' parameter
* The database used to read from is now always the same as the primary,
even if there is a local DB.
* Database->connect() 'check_for_resync' parameter can now be set to
'2', which means "check for resync _if_ I am primary", where '1' still
checks for resync no matter what.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_jobs() to take 'job_host_uuid = all' to allow
loading jobs from all cluster machines. Also updated it to record the
'job_host_uuid' and the unix timestamp version of 'modified_date'.
Signed-off-by: digimer <mkelly@alteeve.ca>
Added DB connections to ocf:alteeve:server when starting or stopping
servers. This is to ensure that the servers -> server_state are updated
properly.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-delete-server to use the new Server->locate method. This
was done as the old Server->locate() was failing to find the server
running on the peer when anvil-delete-server was running on the backup
subnode.
* Updated Server->locate() to search hosts for XML definition and DRBD
configs so that it can record where the server is recorded to run,
even if the server isn't running or defined at the time the locate ran.
Signed-off-by: digimer <mkelly@alteeve.ca>
This updates the /opt/alteeve/screenshot directories and the screenshots
in them to be readible by the WebUI.
Signed-off-by: digimer <mkelly@alteeve.ca>
* This takes a server and new definition XML and updated the database and any available hosts. Does not yet update defined or running servers.
Signed-off-by: digimer <mkelly@alteeve.ca>
* This adds the new 'networks' and 'test_access' parameters to allow
restricting/ordering matched networks, and adds 'test_access' to
validate the link is working.
* Continued work on anvil-manage-server-system
Signed-off-by: digimer <mkelly@alteeve.ca>
* On subnodes and DR hosts, a check is made now in Storage->check_files() for files not linked in file_locations. Any found are added, with a check to see if the file already exists locally and, if so, that the md5sum is accurate or not (to set if the file is ready for use or not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Switched all calls to virsh to use Sys::Virt to deal with contention of simultaneous virsh calls.
* Removed collecting screenshots from scan-server.
* Fixed a bad variable substitution in an alert.
* Fixed a bug where a server's boot time wasn't being recorded properly.
* Reworked how we determine which server definition was most recently updated and propogated.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed handling --eject and --insert to work without a device target specified when only one exists, or to find the file path when only the file name is given.
* Updated anvil-manage-server-storage to show files when processing an optical devices without a file being passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->manage_resource() to check if the host is StandAlone when asked to 'up' a resource and, if so, connect first. Also updated this to error out gracefully if the call to allow_two_primaries() returns non-zero.
* Update Server->migrate_virsh() to error out gracefully if the DRBD->allow_two_primaries() returns non-zero.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->manage_resource() to take the task 'adjust'.
* Updated ocf:alteeve:server's start_drbd_resource() to call adjust if startup of a resource isn't needd.
Signed-off-by: digimer <mkelly@alteeve.ca>
avoid storm of virsh list that overloads libvirtd API causing
unnecessary timeouts during pcmk monitoring operations.
Resolves: https://github.com/ClusterLabs/anvil/issues/395
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
* Updated striker-update-cluster to take '--timeout' and a number of seconds, or 'Xm' or 'Xh' for minutes or hourse, respectively. Also updated to show the remaining time while waiting, and added waiting timeout to the rest of the while loops that prior had no time limit. This addresses issue #383 and issue #382.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage, striker-collect-debug, and striker-update-cluster to be able to find a connection on an interface when none were found on preferred networks.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->get_status() to take the new 'host' paramter to allow the caller to define the hash key string used in the stored data.
* Updated Get->anvil_version() (and a few other places) to use the new 'striker-ui-api' shell user, replacing the 'apache' user.
* Updated Remote->test_access() to take the new 'close' parameter to close the SSH session used when testing access to the target.
* Fixed a logging bug in anvil-manage-power.
* Updated anvil-update-system to take the '--no-reboot' and 'clear-cache' command line switches.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Remote->call() to remove the 'background' parameter as it wasn't working.
* Updated anvil-manage-server-storage to use 'anvil-manage-server-storage' to adjust resources in a way that doesn't block.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Continues work on adding new disks (DRBD volumes) to anvil-manage-server-storage.
* Updated DRBD->get_status() to record the peer-role.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Did more work on adding support for adding new disk drives to servers in anvil-manage-server-storage.
* Updated anvil-manage-storage-groups To check for / delete duplicate storage groups with the same name.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a check to cleanup size input to Convert->human_readable_to_bytes() when passed pre-processed strings.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->manage_resource() to set allow-two-primaries=no when up'ing a resource (as no migration can be in progress during an up command).
* Updated scan-drbd to look for StandAlone resources and call DRBD->manage_resource({task = 'up'}) if a connection to a peer node is StandAlone or if the local disk state is detached.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now.
* Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster.
* Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered.
Signed-off-by: digimer <mkelly@alteeve.ca>
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in Get->available_resources() where DELETED servers were being counted in the used resources math.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Adds support for 'anvil_resources:🐏:reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏:reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏:reserved' option.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Got more work done on anvil-manage-server-storage; Now shows DRBD resource size, backing LV and size, and calculates/displayes metadata size.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->parse_resource() to add references to a resource name and volume for a given backing disk.
* Comtinued work on anvil-manage-server-storage.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created the new Database->get_lvm_data to compile LVM data from scan-lvm
* Updated DRBD->parse_resource to call Database->get_lvm_data if needed, and to track backing devices to Storage Groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created Database->track_files() as a dedicated method as trying to verify the existence of file_locations during Database->load_anvils() was fragile and prone to recursive loops.
* Updated Database->insert_or_update_file_locations() to take an anvil_uuid and recursively call for each host, to maintain compatibility with the old ways, and make it simpler to add an entry for both sub-nodes in an Anvil!.
* Created Storage->push_file() that takes a file and rsync's it to all other machines, or creates a job for the file to be pulled if the target can't be accessed.
* Updated anvil-manage-files and anvil-sync-shared to use the new Storage->push_files and Database->track_files methods.
Signed-off-by: digimer <mkelly@alteeve.ca>
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.
Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in Database->get_storage_group_data() to load hosts data when needed. Also fixed a bug where new members didn't return the new storage_group_member_uuid.
* Updated anvil-manage-host to use the new switch handler.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a check and exit if anvil-manage-dr is asked to protect a server on a machine that doesn't know about that server.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated the storage_group_members table to add the 'storage_group_member_note' that can be set to 'DELETED' to track when a member is deleted. Updated anvil-version-changes to check for and add this column as needed. Updated the anvil.sql schema for the same.
* Updated Cluster->insert_or_update_storage_group_members to add the new column.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created Database->get_anvil_uuid_from_string(), Database->get_host_uuid_from_string() and Database->get_server_uuid_from_string() to simplify the process of converting --anvil <string>, --host <string> and --server <string> respectively.
* Fixed bugs in Database->get_dr_links() and Database->insert_or_update_dr_links().
* Updated Database->insert_or_update_states() to make direct calls to hosts instead of using get_hosts to drop out if a host_uuid doesn't yet exist in a DB.
Signed-off-by: digimer <digimer@gravitar.alteeve.com>