* In Cluster->parse_cib(), added parsers for node attributes and resource rules. Also stored the existence of and details of each under the server resources for easier referencing.
* Updated scan-server to check for / add DRBD fence rules as needed.
Scancore APC agent bugs;
* For clarity, converted all '#!no_value!#' and '#!no_connection!#' to use '!!' instead in APC scan agents.
* Fixed a bug to set/clear alerts related to phases disappearing to deal with concurrent logins from different hosts triggering false phase loss alerts.
* Fixed missing variables not being passed to alerts/log entries.
Started more work on anvil-manage-server, but on hold again while the DRBD fencing work is completed.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Reworked Network->bridge_info() to use 'ip' to get the list of bridges, and 'bridge' to find interfaces connected to the bridge.
* Added 'test' messages to Words->string().
* Fixed a bug in scan-lvm where mdadm based PVs didn't read the sector size properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Remote->test_access() to not used cached SSH access.
* Updated anvil-configure-host to abort if the host is in a cluster.
* Updated anvil-join-anvil to clean up some variable checks to help avoid unitialized variable messages.
* Updated striker-initialize-host to check if an anvil RPM is installed and, if so, not install the Anvil! repo.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Remote->call() to return ('!!error!!', '!!error!!', 9999) when an error hits. Made Remote->test_access() explicitely check for '1' to be returned in order to confirm access, fixing a bug where bad target value caused false positives. Updated ->_check_known_hosts_for_target() to no longer explicitely check for 'ssh-rsa' so that machine keys using different cyphers are detected as being in known_hosts properly.
* Updated striker-auto-initialize-all to initialize nodes and DR hosts networks before trying to form them into an Anvil!. Fixed several other bugs as well. More testing is needed, but it works now.
* Updated striker-initialize-host to check for the alteeve repo and, it not found, check for accress to alteeve.com. If access, it will install our repo now.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Remote->call() where the output of the call not ending in a newline wasn't having the return code parsed off properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Made the error reported by Remote->call() more verbose when called without 'target' being set.
* Updated anvil-daemon to not call jobs more that once per minute.
* Started work on striker-auto-initialize-all, still very far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use - tracking RAM used by a running server
** server_configured_ram - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.
Signed-off-by: Digimer <digimer@alteeve.ca>
* More work done on Email->send_email() to, well, actually send email (which it isn't doing yet, but it's close).
* Updated Words->key() to include the bad key name when no entry for the requested key exists in the words.xml file.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Server->get_status() where the call to Storage->rsync's returned output checked for '!!errer!!' instead of '!!error!!'.
* Fixed a bug in Storage->rsync where, when no port was passed in, it would try to specify an empty port and fail.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed Database->get_ip_addresses() to clear stale IP addresses.
* Finished (for now, more testing needed) System->configure_ipmi! Also created System->test_ipmi() that handles trying lanplus and various password lengths, updating hosts -> host_ipmi on successful check.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created Cluster->get_peers() that figures out who the peer node (and DR host, if applicable) are.
* Updated Cluster->parse_cib() to dig out more information.
* Created Cluster->start_cluster() to start pacemaker (via pcsd) locally or on all (both) nodes.
* Started working on ocf:alteeve:server to start/stop the libvirtd/drbd daemons as needed, instead of having pacemaker do it.
* Got more work done on anvil-join-anvil. Node 2 now waits for the cluster to start, and node 1 will do setup as needed, then wait for the cluster to start as well.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added the fix from the last commit for System->call to handle returned data without an ending newline to Remote->call.
* Got more work done on System->update_hosts(). It's able to add new hosts, but misses the short and FQDN host names. Need to fix that and the verify existing / manual entries aren't molested.
Signed-off-by: digimer <digimer@pulsar.alteeve.com>
* Added the fix from the last commit for System->call to handle returned data without an ending newline to Remote->call.
* Got more work done on System->update_hosts(). It's able to add new hosts, but misses the short and FQDN host names. Need to fix that and the verify existing / manual entries aren't molested.
Signed-off-by: digimer <digimer@pulsar.alteeve.com>
* Update striker manifest run to add an entry into the 'anvils' table, and pass the anvil_uuid to the jobs rather than the various host_uuid's.
* Fixed a bug in the 'anvils' SQL procedure that copied data into the history schema (a few columns were missing).
* Updated anvil-configure-host to reboot when finished to be certain network changes have taken effect. Also updated the handling of virsh bridges to delete the autostart symlinks if libvirtd daemon isn't running.
* Added some logic to anvil-daemon to call 'anvil-update-states' with the -v{1,3} flag depending on the active debug level.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Database->insert_or_update_variables() where, if 'update_value_only' was set but not variable_uuid was passed or could be found, an (incomplete) INSERT would be attempted.
* Added support for generating module metadata when setting up local repos on Striker.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where '$target' being preset to 'local' was causing bad calls to 'Remote->call'.
* Updated Storage->change_mode and -> change_owner to work locally and on remote hosts.
* Barely started work on striker->process_anvil_menu().
Signed-off-by: Digimer <digimer@alteeve.ca>
* Finished the detection of and handling of initialization of a host when the host has no Internet access.
* Disabled (for now) anvil-daemon's check_ssh_keys function.
* Fixed a couple small bugs elsewhere.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Moved System->is_local to Network->is_local, and System->ping to Network->ping.
* Added a check to tools/striker-get-peer-data that will report if the target has Internet access or not.
* Cleaned up the form that prompts the user to enter their Red Hat credentials.
* Updated tools/anvil-manage-keys (and related code) to no longer distinguish by user. If a target is flagged as changed, it is removed from the root and all user's known_hosts files.
* Updated Storage->write_file() and ->update_file() to accept the 'backup' parameter to control if an file that exists is backed up before being updated/replaced.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated System->host_name to work locally and on remote targets.
* Renamed all 'hostname' instances to 'host_name' to standardize on a spelling throughout the program.
* Removed use of and dependency on 'hostname'.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the pxe.txt file to now write a caller for anvil-update-issue in /etc/NetworkManager/dispatcher.d/ifup-local to have the /etc/issue file is updated as soon as the network is brought up, before the GDM login prompt is shown.
* Fixed a couple bugs in tools/anvil-manage-keys, including to ensure that the permissions are retained when a file is updated.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added a check to Remote->call() where, when a connect attempt fails because of a changed/bad key, it is reported as such to the user/logs and an entry is recorded in the state file.
* Started adding a Striker menu function showing users a list of bad keys in known_hosts files and the ability to remove old keys.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Remote->call() to detect when a connection fails because the target's known_hosts entry has changed. Still need to add the function to report this to the user.
* Fixed a bug where new-lines in Words->parse_banged_string() where a double-banged word string's variable value would cause an infinite loop.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Remote->call where a shell call that ended in a newline would work, but throw an error and not get the return code.
* Created Database->get_job_details() which takes a job_uuid and returns the job details, if found.
* Fixed a bug in Jobs->update_progress() where 'clear' wasn't removing the old job_progress data.
* Added the parameters 'no_files' to skip stat'ing/recording non-directories, and 'search_for' which will set the parent directory in 'scan::searched' and stop scanning if found. This allows this method to act as a directory tree scanner and as a search engine.
* Created Striker->get_local_repo() that builds a repo file body suitable for adding to peers, nodes and DR hosts.
* Fixed bugs in the Striker WebUI related to initializing a target node / DR host.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug in Storage->read_file() where a remote read, where the remote user wasn't specified, would cause the call to hange.
* Cleaned up striker->add_sync_peer() to use more clear variable names.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Cleaned up the striker->add_sync_peer() function to more clearly differentiate the ssh port from the pgsql port.
* Improved the HTML form to not have the browser treat host login fields as credentials to autofill or save.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created DRBD->allow_two_primaries() and ->reload_defaults() that enables (and resets/disables) dual-primary operation (allow-two-primaries=yes), used to enable live migration.
* Created Remote->test_access() that simply verifies that a remote target can be accessed (as a given user).
* Created Server->migrate() that actually migrates a server. It can push already, and pull will be added next.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Created System->active_lv() that, surprise, activates an inactive logical volume. Also created ->check_storage() that parses out the LVM data.
* Fixed a bug in tools/fence_pacemaker that was preventing it from compiling and running.
* Updated ocf:alteeve:server to validate the target server's storage.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where Get->host_uuid() wasn't reading from the host.uuid file.
* Updated Remote->call() to record a target's fingerprint when needed.
* The ocf:alteeve:server resource agent now properly stopps a server and the corresponding DRBD resource.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Added a 'timeout' parameter to Remote->call() to limit the time that a command on a remote host can run, with a default of '10' (seconds).
Signed-off-by: Digimer <digimer@alteeve.ca>
* Started work of "Files" (replacement for the media library), including database tables, planned sync flow and web UI.
* Added a check for the /mnt/shared directories and create them as needed in the periodic anvil-daemon checks.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated anvil.sql to add the new tables needed for alert mail delivery.
* Update anvil.sql and Database->initialize to now default the user to 'admin' and swap that out if needed, instead of using the #!variable!user!#' replacement variable.
* Started updating anvil.spec for EL8.
* Added support for 'striker::repo::extra-packages' which users can use to add additional packages to the Striker repositories.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated the Database module to not sort or reorder the 'core_tables' array, and reordered them in the hash they're declared in.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug where "form::error_massage" was default to ' ' which caused simple checks to thing there was an error when all was fine.
* Got the "add new sync peer" form confirmation box displaying and cleaned up the CSS a bit.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Updated Get->users_home() to default to return the hore directory for the user running the program.
* Updated Remote->call() to start working on handling timeouts.
* Updated Storage->change_owner(), ->make_directory() and ->write_file() to default the the user and group running the program.
* Fixed a bug in home reporting the MAC address of NICs when confirming configuration of Striker. Also changed showing the domain to the hostname.
* Got more work done on sync peers.
* Updated the RPM spec file to install on Fedora 28.
Signed-off-by: Digimer <digimer@alteeve.ca>
* Changed 'database:❌:...' so that 'x' is now the database host's UUID instead of a simple integer. This will simplify sync'ing configs. Also removed default entries, and made it so that anvil-prep-database injects the local config during first setup. Renamed Database->get_local_id to get_local_uuid and changed the 'id' parameter to 'uuid'. Changed Database->initialize's 'id' parameter to 'host_uuid'. The Database->query, Database->write, Database->_mark_database_as_behind and Database->_find_behind_databases methods had their 'id' parameter changed to 'uuid'.
* Added the 'remote_user' parameter to Get->anvil_version, System->ping and System->change_shell_user_password for conencting to remote targets.
* Added the 'remote_user' parameter to all internal Remote->call uses.
* Updated Storage->backup, Storage->copy_file, Storage->make_directory,
Signed-off-by: Digimer <digimer@alteeve.ca>
* Fixed a bug with handling ssh fingerprints (and removed comments going to the known_hosts file).
* Added more nested debug parameter passing when methods call other methods (though more work is needed to catch up)
Signed-off-by: Digimer <digimer@alteeve.ca>