* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Database->check_agent_data() where the list of tables wasn't passed in, and thus the table list wasn't then passed on to Database->_find_behind_databases().
* Started work on a new method called Storage->parse_lsblk().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Database->insert_or_update_temperature() to accespt a 'delete' parameter (which, surprise, deletes a record).
* Updated (but not yet tested) the RPM .spec to require that 'core' and the other three packages are required to be the same version.
* Updated scan-ipmitool and scan-storcli scan agents to now delete temperature data belonging to lost sensors.
* Fixed tools/striker-manage-install-target to add multiple missing packages.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updates servers -> server_host_uuid to drop the 'NOT NULL' constraint.
* Created the new Get->server_uuid_from_name() that does what it says on the tin. Fixed a bug in ->host_uuid_from_name() where the host name was being returned instead of the UUID.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the servers table to remove the 'not null' constraints on the server_start_after_server_uuid, server_pre_migration_file_uuid and server_post_migration_file_uuid columns.
* Updated ScanCore->agent_startup() to connect to the database(s) when there isn't a table list.
* Updated Server->migrate_virsh() to test for DB access before making DB calls (to allow ocf:alteeve:server to function even if all ScanCore DBs are offline).
* Updated ocf:alteeve:server to connect to the databases (though work without it), and changed '$FILE_NAME' to be 'ocf:alteeve:server' (to make logging more legible)
* Created the skeleton for 'scan-storage'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed servers -> 'server_clean_stop' to 'server_user_stop' to make it clearer what the column represents.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use - tracking RAM used by a running server
** server_configured_ram - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the 'defitintions' table to 'server_definitions' to clarify the purpose, and made all the 'server' columns have then 'not null' constraint.
* Created Database->insert_or_update_servers(), ->get_servers(), ->insert_or_update_server_definitions() and ->get_server_definitions().
* Updated scancore, anvil-daemon, and scan agents to not run unless they're run with root privs.
* Got scan-server to update the servers / server_definition tables and the on-disk file when needed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert->check_alert_sent() and the 'alert_sent' table to remove 'alert_name' as it really isn't helpful given record_locator.
* Created 'Database->purge_data' that takes an array of tables and, in reverse, purges them from the database(s). This also disables archiving and resync functions now.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Alert->register to take a hash reference for message variables to simplify when a caller plans to log and register an alert at the same time.
* Updated Convert->bytes_to_human_readable() to name the 'size' variable used internally for 'bytes' to actually be 'bytes' for better consistency.
* Created multiple new Database methods;
** ->check_condition_age() is meant to be used by scan agents to see how long a given condition has been in play (ie: how long ago power was lost to a UPS or a sensor became unreadable).
** ->insert_or_update_health() handles recording data to the new 'health' table, used for determining ideal hosts for servers between nodes.
** ->insert_or_update_power() handles recording data to the new 'power' table, used for determining how power events are handled.
** ->insert_or_update_temperature() handles recording temperature data to the new 'temperature' table, used to determine how thermal events are handled.
* Got a lot more done on the scan-hardware scan agent. Only part left now is post-scan health processing.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Renamed the scan-network skeleton scan agent to scan-hardware and started work on it based on the M2 version.
* Updated Database->get_recipients() to take the 'include_deleted' parameter, and changed the default behaviour to only return active records.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Dropped support for supporting imperial measurements in generated emails.
* Created Database->get_alerts() to read in alert data and ->get_recipients() to get the list of alert recipients.
* SQL Schema changes;
** Added 'alert_processed' to 'alerts' to track what alerts have been processed.
** Changed 'recipient_new_level' to 'recipient_level' now that we're only using 'notifications' as a per-host override for user/hosts alert levels.
** Removed 'recipient_units' as we're no longer supporting non-metric values.
* Updated Alert->register() to take strings for the alert level (which gets translated to integers).
* Created Email->get_current_server() to returned the mail_server_uuid of the active mail server (if any). Created ->send_alerts() to process unprocessed alerts and send emails to recipients.
* Updated Words->parse_banged_string() to take the 'language' parameter.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Tools->_set_defaults where the order the tables were sync'ed it caused primary/foreign keys would trigger DB errors when resync'ing in some cases.
* Created Database->log_connections to make it easier to log which databases are actively in use and other data about the connections.
* Fixed bugs in striker-manage-peers that (partly because of the above bugs) failed to connect to new peers properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Got email server configuration under way. A mail server can now be configured via Email->_configure_for_server(), but more work is needed on when to switch between configs.
* Fixed some logging of passwords that wasn't being checked to see if secure logging was enabled or not.
* Fixed a bug in Striker where the back arrow in email config sub-sections weren't going back to the main email menu.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Server->get_status() where the call to Storage->rsync's returned output checked for '!!errer!!' instead of '!!error!!'.
* Fixed a bug in Storage->rsync where, when no port was passed in, it would try to specify an empty port and fail.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->parse_cib() to take a CIB as a parameter.
* Fixed a bug in Database->get_hosts() where loading the host_ipmi value was filtered through Log->is_secure.
* Updated Striker->get_fence_data() to parse the switches to make it easier to map a fence agent's command line switches to STDIN arguments.
* Created System->parse_arguments() that converts a series of command line switches and their values into a hash. It's similar to Get->switches(), but works on any string.
* Continued work on anvil-join-anvil's fence configuration logic.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Got anvil-join-anvil to the point where is initializes and starts the cluster.
* Deleted the old ssh key handling logic in anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Get->trusted_hosts() that finds the dashboards the host uses and, if the host is in an Anvil!, the peers in the same anvil.
* Created (but not finished yet) System->update_hosts() that will add and edit entries for all IPs to trusted hosts.
* Fixed a logging bug in Striker->load_manifest().
* Fixed a bug in System->call where, the the output from the shell call didn't end in a new-line, it would not parse the return code and lease the return code string appended to the shell output.
* Fixed a big in System->change_shell_user_password() where a new-line (\n) meant for the shell call wasn't escaped properly. There was also a duplicate 'return_code' variable preventing the actual return code from being read.
* Got more work done on anvil-join-anvil to update the hacluster password (when needed).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added filters Striker->get_fence_data() for parameters. Manually change 'action' entries from 'string' to 'select' and use the data in the 'actions' element to populate it, with actions that don't make sense filtered out.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug (well, made a work-around for an issue without a known reproducer) where, on some occassion, a record will end up in the public table without being copied into the history schema. When this happens, the next resync would crash out because the resynd reads in the history table only. Now, when about to INSERT a record into the public schema during a resync, an explicit check is made to see if the record alread
y exists. If it does, the INSERT is instead redirected to the history schema.
* Cleaned up the fence agent metadata when displaying to a user, converting the shell codes to underline a string with square brackets instead. We also now replace newlines with <br /> tags. Lastly, to help fence_azure_arm's metadata description to display cleanly, a check is made to format the table correctly.
* Began work on the Striker menu for handling fence device management
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Striker->get_fence_data() that reads/parses the unified fence metadata file created by tools/striker-parse-fence-agents.
* Created the new 'fences' database table and Database->insert_or_update_fences() to handle it.
* Added hosts -> host_ipmi that will, later, store information on how to access the host's IPMI interface, when available.
* Sketched out how the new Install Manifests are going to work.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Accounts->read_cookies where, when a user's hash had expired, the logged error message didn't show the user's name.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Tools->refresh to reload anvil.conf in one call.
* Created Anvil::Tools::Network to hold network-related tasks.
** Created Network->is_remote() that tests to see if a string (containing a target) refers to the remote machine (versus a local machine). Updated all previous checks to use this new method.
** Moved Get->network_details() and Get->network() to the new Network module. Renamed Get->network() to Network->get_network().
** Made Network->get_ips() work locally and remotely.
** Created Network->find_matches() that compares two scanned machines IPs (via two previous calls to Network->get_ips())
* Created Database->manage_anvil_conf() that will add, update or remove a given database connection in a local or remote anvil.conf file.
* Fixed bugs in Storage->backup() where the bash calls were quite broken. I'm not sure how it ever worked before... x_x
* Updated anvil-daemon to not initialize a database unless it's running on dashboard. Also added a check at the startup of anvil-daemon where it will go into a loop waiting for a database to become available, re-reading anvil.conf each loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Remote->call where a shell call that ended in a newline would work, but throw an error and not get the return code.
* Created Database->get_job_details() which takes a job_uuid and returns the job details, if found.
* Fixed a bug in Jobs->update_progress() where 'clear' wasn't removing the old job_progress data.
* Added the parameters 'no_files' to skip stat'ing/recording non-directories, and 'search_for' which will set the parent directory in 'scan::searched' and stop scanning if found. This allows this method to act as a directory tree scanner and as a search engine.
* Created Striker->get_local_repo() that builds a repo file body suitable for adding to peers, nodes and DR hosts.
* Fixed bugs in the Striker WebUI related to initializing a target node / DR host.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created tools/anvil-check-memory to report how much RAM is used by a given program.
* Added documentation for some previously undocumented methods.
* Updated Database->archive_database() to take the 'tables' parameter.
* Updated Storage->scan_directory() to record a directory's mode and type, even when recursive isn't used.
* Finished System->check_memory().
* Updated ocf:alteeve:server to now NOT stop a DRBD resource unless 'stop_drbd_resources'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->allow_two_primaries() and ->reload_defaults() that enables (and resets/disables) dual-primary operation (allow-two-primaries=yes), used to enable live migration.
* Created Remote->test_access() that simply verifies that a remote target can be accessed (as a given user).
* Created Server->migrate() that actually migrates a server. It can push already, and pull will be added next.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Started working on ocf:alteeve:server to make it smarter (and more patient) when bringing a DRBD resource up so we don't get false failures when we hit a race.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Moved firewall.txt out of the templates directory and into the tools directory so that it is accessible on nodes and DR hosts (which don't get the apache files).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in System->get_ips() where DHCP-assigned IPs were not being parsed properly to get the default gateway.
* Added the alteeve-el8-repo to the kickstart files install package list.
* Updated anvil-daemon to sleep 2 seconds between loops, instead of 1. Added a check to 'check_firewall' to not run until after the system has been configured.
* Quieted a lot of logging.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated package list to fix changed dependencies from RHEL 8 beta to final.
* Changed anvil_daemon to only check DHCP once per minute instead of every loop.
Signed-off-by: Digimer <digimer@alteeve.ca>