* Updated Get->available_resources() to record the maximum cores that
can be allocated to a server. This is N-1 for hosts with 4 or less
cores, or N-2 cores otherwise.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->insert_or_update_servers() to error if the RAM being
recorded is less than 640 KiB. This is because, somewhere yet
undiscovered, the RAM is being recorded in KiB which breaks things.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-dr to handle DR hosts without a VG in a given SG
* Fixed up minor display issues in anvil-manage-storage-groups
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_hosts() to store hosts in a host_type hash.
* Updated Database->get_servers() to store servers by name, regardless
of host Anvil! node.
Signed-off-by: digimer <mkelly@alteeve.ca>
This branch resolves issue #462; Auto growing PVs. Specifically, it looks at the LVM PVs on the host and checks to see if there is unused free space after the backing partition. If there is, it auto-grows the partition and then resizes the PV. This featu
re is designed to make life easier for users who deleted the auto-created '/home' partition during the anaconda disk partitioning tool.
* Created Storage->auto_grow_pv() that does the above.
* Added the missing hidden method name _create_rsync_wrapper in the Storage module POD.
* Added a call to Storage->auto_grow_pv() in anvil-configure-host and anvil-version-changes for nodes and DR.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->connect_to_libvirt() to check that the target URI's
SSH fingerprint is recorded before connecting. Also added an alarm
wrapper around the Sys::Virt->new() call.
* Continued work on anvil-manage-server-system, working on the boot
order section now.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->locate() to take the new 'anvil' parameter to speed up
searches.
* Updated Server->update_definition() to use Server->locate() to find
where updates are needed. It now also defines the server with the new
config.
Signed-off-by: digimer <mkelly@alteeve.ca>
* If the call to Remote-call() set the target that was actually the
local short hostname, it would fail to make the call at all. Now if
the 'target' is local, the shell call is instead passed to
System->call() instead.
* Cleaned up logging.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Get->host_name() to accept the new 'refresh' parameter. This
forces a reread of the hostname, instead of using the cached value.
* Updated System->host_name() so that, when it's updating the hostname,
it updates the database and cached variables.
* Updated Words->center_text() to avoid undefinied parameter issues.
* Updated anvil-join-anvil to ensure the 'sys::host_name' variable.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Striker->generate_manifest() to add pod and make the prefix,
sequence and domain parameters required.
* Created the check_for_broken_manifests() function for anvil-daemon to
detect/remove broken manifests.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a called to Database->_check_for_duplicates to Database->resync_databases
* Added 'check_for_resync => 1' to anvil-configure-host.
Signed-off-by: digimer <mkelly@alteeve.ca>
Moved the logic to a new private method, and call it now from the active
Striker in the once per minute loop. The duplicate variable issue seems
to be not entirely uncommon.
Signed-off-by: digimer <mkelly@alteeve.ca>
With this new system, a 'primary_db' is chosen (first connected DB UUID when sorted) and only it does resyncs. Further, resyncs have been pulled from all tools except anvil-daemon. So with this new system, the chances of duplicate, simultaneous resyncs should be removed (hopefully for real this time).
* Database->check_agent_data() no longer calls a resync after loading a
schema.
* Removed the Database->coonnect() 'all' parameter
* The database used to read from is now always the same as the primary,
even if there is a local DB.
* Database->connect() 'check_for_resync' parameter can now be set to
'2', which means "check for resync _if_ I am primary", where '1' still
checks for resync no matter what.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_jobs() to take 'job_host_uuid = all' to allow
loading jobs from all cluster machines. Also updated it to record the
'job_host_uuid' and the unix timestamp version of 'modified_date'.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Wrote the man page for striker-boot-machine, changing --host-name to
--host, and adding the '--host all' support.
* Updated anvil-manage-host to support checking/enabling/disabling
network mapping mode.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-delete-server to use the new Server->locate method. This
was done as the old Server->locate() was failing to find the server
running on the peer when anvil-delete-server was running on the backup
subnode.
* Updated Server->locate() to search hosts for XML definition and DRBD
configs so that it can record where the server is recorded to run,
even if the server isn't running or defined at the time the locate ran.
Signed-off-by: digimer <mkelly@alteeve.ca>
This updates the /opt/alteeve/screenshot directories and the screenshots
in them to be readible by the WebUI.
Signed-off-by: digimer <mkelly@alteeve.ca>
* This takes a server and new definition XML and updated the database and any available hosts. Does not yet update defined or running servers.
Signed-off-by: digimer <mkelly@alteeve.ca>
* This adds the new 'networks' and 'test_access' parameters to allow
restricting/ordering matched networks, and adds 'test_access' to
validate the link is working.
* Continued work on anvil-manage-server-system
Signed-off-by: digimer <mkelly@alteeve.ca>
* On subnodes and DR hosts, a check is made now in Storage->check_files() for files not linked in file_locations. Any found are added, with a check to see if the file already exists locally and, if so, that the md5sum is accurate or not (to set if the file is ready for use or not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* This is a test to see if the job waiting for the uptime to be 300s,
leaving the job_progress as 0, was causing the job to be repeatedly
called.
* This is related to issue #479
Signed-off-by: digimer <mkelly@alteeve.ca>
* This adds a check where anvil-join-anvil waits until both subnodes are
marked as configured and not in maintenance mode.
* Should address issue #479 (maybe, this shouldn't trigger reboots, but
it was certainly a race condition found while investigating).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed handling --eject and --insert to work without a device target specified when only one exists, or to find the file path when only the file name is given.
* Updated anvil-manage-server-storage to show files when processing an optical devices without a file being passed.
Signed-off-by: digimer <mkelly@alteeve.ca>
The old check evaluates the expression before determining whether the
resulting value is defined. However, when the expression refers to a
subroutine, it gets executed; if the subroutine doesn't protect against
missing parameters, it'll cause executions with bad input, i.e., the
Striker->generate_manifest subroutine without parameters.
The new check uses can(), which correctly determines whether the key
"exists" on the blessed object; although it doesn't strictly mean
"exists", but it does the job.
* Fixed the RC in ocf:alteeve:server to exit with 0 on notify calls, resolves issue #392.
* Fixed typo references in issue #390.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated the pcs wrapper to flock anything but status calls.
* Updated scan-apc-pdu to purge regardless of the host it's called on any host.
* Fixed a bug striker-purge-target that wouldn't purge anvil nodes in various cases.
Signed-off-by: digimer <mkelly@alteeve.ca>
avoid storm of virsh list that overloads libvirtd API causing
unnecessary timeouts during pcmk monitoring operations.
Resolves: https://github.com/ClusterLabs/anvil/issues/395
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
* Updated striker-update-cluster to take '--timeout' and a number of seconds, or 'Xm' or 'Xh' for minutes or hourse, respectively. Also updated to show the remaining time while waiting, and added waiting timeout to the rest of the while loops that prior had no time limit. This addresses issue #383 and issue #382.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created System->wait_on_dnf() which was plucked from anvil-daemon, and now also called in scancore and anvil-safe-start.
* Updated scancore and anvil-safe-start to check on start that DRBD's kernel module is available (and build if not).
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage, striker-collect-debug, and striker-update-cluster to be able to find a connection on an interface when none were found on preferred networks.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in System->reboot_needed() where the cache file path had a typo in the hash key.
* Updated anvil-daemon to use the full path to dnf when determining if a dnf process was running.
Signed-off-by: digimer <mkelly@alteeve.ca>
Note that work has started it reworking anvil-update-system, but it is incomplete (and broken) in this commit.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added the anvil-report-usage.8 man page
* Updated anvil-update-system to enable scancore when the OS update is complete.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Server->shutdown_virsh() to work without a database connection.
* Updated System->reboot_needed() to store/read from a cache file when the database is not available.
* Updated anvil-safe-start to remove the old --enable/disable/status switches, now that we use anvil-safe-start.service systemd unit.
* Reworked anvil-safe-stop to work without a database connection, and to work on DR hosts.
* Updated anvil-special-operations to add new tasks, but it's likely these new tasks aren't needed and will be removed very shortly.
* Added/updated multiple man pages.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated striker-update-cluster and anvil-update-system to take '--reboot' to request a reboot if any packages are updated.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-update-system to reboot a target whose kernel updated using an anvil-manage-power job,
* Started making striker-update-cluster run as a job (not at all complete). Fixed a bug where the wrong IP was being used when finding access to a target.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in anvil-safe-stop where it wouldn't trigger a migration when the peer is online.
* Updated anvil-update-system to set job_data to 'failed' and exit with rc 4 if the os update failed.
* Got striker-update-cluster to error out and exit if a called 'anvil-update-system' job failed.
Signed-off-by: digimer <mkelly@alteeve.ca>