* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.
Signed-off-by: Digimer <digimer@alteeve.ca>
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.
To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert->register() to take message variables using the 'variables' parameter.
* Added a 'cache' parameter to Database->insert_or_update_health() and ->insert_or_update_temperature(). When set, the SQL UPDATE/INSERT calls and pushed into the array reference set in 'cache'. This is to allow performance improvements when processing a large amount of sensor/device data.
* Updated Log->variables() to take a 'prefix' parameter that, when set, will prefix the string to each variable line.
* Updated scan-ipmitool to use Database->insert_or_update_health() and ->insert_or_update_temperature().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug if Get->free_memory() where host_type was still being called from the old System->host_type method.
* Added global support for '--log-secure' and '--log-db' switches to enable logging of secure data and DB transactions, respectively.
* Created Database->get_tables_from_schema() that parses a SQL schema file and returns an array reference of tables found, in the order they were found.
* Updated ScanCore->agent_startup() to no longer require manually defining database tables, using Database->get_tables_from_schema() when not manually set.. Updated all existing agents to use this.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use - tracking RAM used by a running server
** server_configured_ram - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the 'defitintions' table to 'server_definitions' to clarify the purpose, and made all the 'server' columns have then 'not null' constraint.
* Created Database->insert_or_update_servers(), ->get_servers(), ->insert_or_update_server_definitions() and ->get_server_definitions().
* Updated scancore, anvil-daemon, and scan agents to not run unless they're run with root privs.
* Got scan-server to update the servers / server_definition tables and the on-disk file when needed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added support in anvil.conf to disable scan agents with 'scancore::<agent_name>::disable', and added handling this to agents. Also allowed for '--force' to override this setting.
* Updated ScanCore->agent_startup() to allow for empty scan agent table lists.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Did more work on parsing server data out of the CIB. There is still an issue with determining which node currently hosts a resource, however.
* Renamed Server->boot to ->boot_virsh, ->shutdown to ->shutdown_virsh and ->migrate to ->migrate_virsh to clarify that these methods work on the raw virsh calls, outside of pacemaker (indeed, they are what the pacemaker RA uses to do what pacemaker asks).
* Got more work done on the scan-cluster SA.
* Created the empty files for the pending scan-server SA.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Started work on the new scan-cluster scan agent that will parse out and store data from the pacemaker CIB.
Signed-off-by: Digimer <digimer@alteeve.ca>