* Switched all calls to virsh to use Sys::Virt to deal with contention of simultaneous virsh calls.
* Removed collecting screenshots from scan-server.
* Fixed a bad variable substitution in an alert.
* Fixed a bug where a server's boot time wasn't being recorded properly.
* Reworked how we determine which server definition was most recently updated and propogated.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Hit a bug where a server's definition file was written to disk while not being valid. Added logging in case it happens again, and additional safe-guards to help avoid it from recurring.
Signed-off-by: digimer <mkelly@alteeve.ca>
* Updated anvil-manage-server-storage to the point where it can now insert and eject optical disks!
* Updated System->call to log parameters if 'shell_call' isn't set.
* Fixed a bug in anvil-manage-server process_interactive where an $anvil->data reference was being scoped.
Signed-off-by: digimer <mkelly@alteeve.ca>
* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.
Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.
Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->get_anvils() to make it possible to translate a file name to a file UUID.
* Updated System->test_ipmi() to quote passwords properly. Also dropped the timeouts to 2 seconds.
* Updated anvil-provision-server to support pure CLI switch server provisioning using the --ci-test (and optional --options {--machine}) to allow CI tests.
* Continued work of anvil-manage-server.
* Fixed a bug in striker-prep-database to fix a bug in writing the pg_hba.conf file.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created ScanCore->agent_shutdown() that writes out the time the scan agent last ran, and how many databases were available when it last ran.
* Updated ScanCore->agent_startup() to read the the last run data created above.
* Updated Database->connect() to set 'sys::database::last_db_count' to the scan agent's recorded last DB count.
* Updated all agents to call ScanCore->agent_shutdown() at the end of their run.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-daemon to have a new function called "handle_special_cases" called during startup that does any weird bug mitigation required. For now, this is used to mitigate against rhbz#1961562, though certainly it will be used for other reasons later.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil-safe-start to start DRBD resources when the peer's DRBD resourcs is 'Connecting',
* Updated fence_pacemaker to more intelligently check the list of host names related to an IP address when looking for the peer host name
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.
Signed-off-by: Digimer <digimer@alteeve.ca>
Note: These changes below shouldn't have been in this branch... *sigh*
* Fixed an issue with tools/anvil-provision-server where a VM would be created but didn't boot. When this happens, an explicit boot is sent via virsh. Also bumped up the time it waits for a new server to start up.
* Added an explicit call to scan-drbd after a new resource is created to ensure that if any calls come after looking for the next free DRBD minor or port, they don't use the ones just used.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert->register() to take message variables using the 'variables' parameter.
* Added a 'cache' parameter to Database->insert_or_update_health() and ->insert_or_update_temperature(). When set, the SQL UPDATE/INSERT calls and pushed into the array reference set in 'cache'. This is to allow performance improvements when processing a large amount of sensor/device data.
* Updated Log->variables() to take a 'prefix' parameter that, when set, will prefix the string to each variable line.
* Updated scan-ipmitool to use Database->insert_or_update_health() and ->insert_or_update_temperature().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug if Get->free_memory() where host_type was still being called from the old System->host_type method.
* Added global support for '--log-secure' and '--log-db' switches to enable logging of secure data and DB transactions, respectively.
* Created Database->get_tables_from_schema() that parses a SQL schema file and returns an array reference of tables found, in the order they were found.
* Updated ScanCore->agent_startup() to no longer require manually defining database tables, using Database->get_tables_from_schema() when not manually set.. Updated all existing agents to use this.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the servers table to remove the 'not null' constraints on the server_start_after_server_uuid, server_pre_migration_file_uuid and server_post_migration_file_uuid columns.
* Updated ScanCore->agent_startup() to connect to the database(s) when there isn't a table list.
* Updated Server->migrate_virsh() to test for DB access before making DB calls (to allow ocf:alteeve:server to function even if all ScanCore DBs are offline).
* Updated ocf:alteeve:server to connect to the databases (though work without it), and changed '$FILE_NAME' to be 'ocf:alteeve:server' (to make logging more legible)
* Created the skeleton for 'scan-storage'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed servers -> 'server_clean_stop' to 'server_user_stop' to make it clearer what the column represents.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use - tracking RAM used by a running server
** server_configured_ram - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the 'defitintions' table to 'server_definitions' to clarify the purpose, and made all the 'server' columns have then 'not null' constraint.
* Created Database->insert_or_update_servers(), ->get_servers(), ->insert_or_update_server_definitions() and ->get_server_definitions().
* Updated scancore, anvil-daemon, and scan agents to not run unless they're run with root privs.
* Got scan-server to update the servers / server_definition tables and the on-disk file when needed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Began (but haven't finished) Database->insert_or_update_servers().
* Created Storage->get_file_stats() to collect the (l)stat information for a file.
* Got more work done on scan-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added support in anvil.conf to disable scan agents with 'scancore::<agent_name>::disable', and added handling this to agents. Also allowed for '--force' to override this setting.
* Updated ScanCore->agent_startup() to allow for empty scan agent table lists.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Did more work on parsing server data out of the CIB. There is still an issue with determining which node currently hosts a resource, however.
* Renamed Server->boot to ->boot_virsh, ->shutdown to ->shutdown_virsh and ->migrate to ->migrate_virsh to clarify that these methods work on the raw virsh calls, outside of pacemaker (indeed, they are what the pacemaker RA uses to do what pacemaker asks).
* Got more work done on the scan-cluster SA.
* Created the empty files for the pending scan-server SA.
Signed-off-by: Digimer <digimer@alteeve.ca>