* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Added a check to cleanup size input to Convert->human_readable_to_bytes() when passed pre-processed strings.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a small logging bug in DRBD->allow_two_primaries().
* Updated Database->get_jobs() to record jobs sorted by modified_date so that jobs can be run in the order they were recorded.
* Updated anvil-daemon to track which commands need to be run, and when two or more of the same command need to be run, they're run serially, with each subsequent run starting after the previous one completes.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->manage_resource() to set allow-two-primaries=no when up'ing a resource (as no migration can be in progress during an up command).
* Updated scan-drbd to look for StandAlone resources and call DRBD->manage_resource({task = 'up'}) if a connection to a peer node is StandAlone or if the local disk state is detached.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->delete_resource() to call scan-drbd and scan-lvm to ensure that the database is updated with the newly freed resources.
* Updated anvil-delete-server and anvil-provision-server to call select scan agents to ensure freed resources are immediately recorded.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Cluster->recover_server() to set the desired recovery state before calling the crm_resource refresh.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->get_status() to attempt to recompile the drbd kernel module if the drbdsetup status fails. If it continues to fail, it exits gracefully now.
* Updated ocf:alteeve:server to test access over a given IP before calling Server->find to avoid timeouts when the peer is down. Also updated it to set the constraints to keep the server on the new host when the old host returns to the cluster.
* Fixed a bug in scan-cluster where a server that is FAILED but not running is now properly recovered.
Signed-off-by: digimer <mkelly@alteeve.ca>
* In DRBD->get_next_resource(), implemented a "hold" system where the DRBD minor and TCP port(s) returned are marked as being held for one minute. So subsequent calls won't use the same numbers.
* In anvil-daemon, added a check in run_jobs() where only one instance of a given job command will be started per 2-second loop. This should help reduce the chance of simultaneous race confitions in general.
* Removed from anvil-provision-server and most other tools the call to Job->get_job_uuid(). If the program is called without the job_uuid, don't try to find it. This allows a human (or script) to make repeated calls to a program without one of those calls running a pending job instead.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug in Get->available_resources() where DELETED servers were being counted in the used resources math.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Adds support for 'anvil_resources:🐏:reserved' that can be set to a number of MiB to override the default 8192.
* Adds support for 'anvil::<anvil_uuid>::resources:🐏:reserved' to allow for per-Anvil! node override on the reserved RAM default, and over the 'anvil_resources:🐏:reserved' option.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated (not not yet completed) scan-cluster's check_resources() function to check if a FAILED server is ready to try to recover.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Fixed a bug where servers protected by DR hosts aren't deleted when the server itself is deleted.
* Updated DRBD->delete_resource() to remove the server's XML file if the host is a DR host.
* Updated anvil-version-change and anvil.sql to enable update_audits and the audits table.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Got more work done on anvil-manage-server-storage; Now shows DRBD resource size, backing LV and size, and calculates/displayes metadata size.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated DRBD->parse_resource() to add references to a resource name and volume for a given backing disk.
* Comtinued work on anvil-manage-server-storage.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created the new Database->get_lvm_data to compile LVM data from scan-lvm
* Updated DRBD->parse_resource to call Database->get_lvm_data if needed, and to track backing devices to Storage Groups.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated Database->get_file_locations() to record files available on Anvil! nodes by tracking hosts in Anvil! systems (needed after reworking how DR hosts are linked).
* Updated Get->available_resources() to call Database->get_files() and ->get_file_locations() to restore tracking files available on Anvil! nodes.
* Fixed a couple display bugs in anvil-provision-server when called with --ci-test --options.
* Continued work on anvil-manage-server-storage.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created Database->track_files() as a dedicated method as trying to verify the existence of file_locations during Database->load_anvils() was fragile and prone to recursive loops.
* Updated Database->insert_or_update_file_locations() to take an anvil_uuid and recursively call for each host, to maintain compatibility with the old ways, and make it simpler to add an entry for both sub-nodes in an Anvil!.
* Created Storage->push_file() that takes a file and rsync's it to all other machines, or creates a job for the file to be pulled if the target can't be accessed.
* Updated anvil-manage-files and anvil-sync-shared to use the new Storage->push_files and Database->track_files methods.
Signed-off-by: digimer <mkelly@alteeve.ca>
* To update file handling for the new DR host linking mechanism, file_locations -> file_location_anvil_uuid was changed to file_location_host_uuid.
This required a fair number of changes elsewhere to handle this, with a particular noted change to Database->get_anvils() to look at host_uuid's for the subnodes in an Anvil! and, if either is marked as needing a file, make sure the peer is as well. Similarly, any linked DRs are set to have the file as well.
* Created a new Network->find_access that simply takes a target host name or UUID, and it returns a list of networks and IPs that the target can be accessed by.
* Updated Network->load_ips() to find the network interface being used for traffic so that things like the interface speed can be recorded, even when an IP is on a bridge or bond.
Unrelated, but in this commit, is a restoration of calling scan agents with a timeout now that the virsh hang issue has been resolved.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.
Signed-off-by: digimer <mkelly@alteeve.ca>