Commit Graph

104 Commits

Author SHA1 Message Date
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
fa3c861a97 * Started work again on get_shared_storage
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-02 18:31:54 -04:00
Digimer
ebfecf39c3 * Updated Database->manage_anvil_conf() to not manually create a backup, as Storage->write_file() creates a backup anyway (so we were getting two backups per one change).
* Updated Storage->write_file() to add a short UUID suffix to the temp file before rsync'ing to the target to help avoid source temp file name collisions in parallel running copies to different targets.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-23 06:01:45 -04:00
Digimer
3ed857bacd * Bumped logging for striker-auto-initialize-all debugging.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-22 15:22:46 -04:00
Digimer
0fb191c00f * Made more progress on tools/striker-auto-initialize-all, now to the point where it loads the variables needed to initialize Striker dashboard.
* Cleaned up / added some logging in various locations.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-02 01:18:18 -05:00
Digimer
1b65f53faa * Remove host-health from the 'hosts' table as it wasn't needed, given the 'health' table. Bumped the SQL version to 0.0.2
* Updated Get->os_type() to use 'cat' instead of Storage->read_file() because 'rsync' may not be available when it is called during striker-initialize-host calls.
* Updated Database methods to skip 'oui' and 'state' during resync.
* Updatedb striker-initialize-host to detect when it's initializing a CentOS Stream Node / DR Host and enable the HA repo.
* Created the tools/striker-auto-initialize-all tool, which is very much incomplete, that will allow for the rapid creation of a full Anvil! from freshly installed machines autonomously.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-22 19:22:47 -05:00
Digimer
e8efbab343 * The work on PXE / UEFI support is broken, and will be set aside for the time being. The commit here is working to getting things fixed, but it's taking too much time away from more pressing issues.
* This commit includes two unrelated test files for UI work, cgi-bin/get_anvil_status and cgi-bin/get_anvils.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-16 17:32:43 -05:00
Digimer
89dec8e1f9 * Finished anvil-delete-server! (More testing needed though)
* Fixed a bug in Cluster->shutdown_server() where the wrong variable was being evaluated when checking the server state.
* Created DRBD->delete_resource() that deletes a resource's backing device and configuration. Note that this wipes the DRBD MD and and FS signatures before removing the LV. Updated DRBD->gather_data() to record the backing devices for volumes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-26 01:45:17 -05:00
Digimer
549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed).
* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-25 01:00:55 -05:00
Digimer
05b1fccdb3 * Created Cluster->add_server() which, well, adds a server to a pacemaker cluster, including sorting out location constraints to favour the node the server is running on, if it's running.
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-18 00:38:06 -05:00
Digimer
20b75310b7 * Created DRBD->resource_uuid() which reads (or adds) the 'scan_drbd_uuid' to the resource configuration file, which will allow us to track resources when they're renamed.
* Got scan-drbd's read_last_scan() and process_drbd() functions done (but untested so far).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-01 23:59:54 -05:00
Digimer
4d5ec72026 * Started work on the scan-drbd scan agent. Got it to the point that it is gathering needed data.
* Fixed a bug in Database->check_agent_data() where the list of tables wasn't passed in, and thus the table list wasn't then passed on to Database->_find_behind_databases().
* Started work on a new method called Storage->parse_lsblk().

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-30 04:18:16 -05:00
Digimer
cda51e562d * Finished porting scan-hpacucli, the last M2 scan agent!
* Updated Database->insert_or_update_temperature() to accespt a 'delete' parameter (which, surprise, deletes a record).
* Updated (but not yet tested) the RPM .spec to require that 'core' and the other three packages are required to be the same version.
* Updated scan-ipmitool and scan-storcli scan agents to now delete temperature data belonging to lost sensors.
* Fixed tools/striker-manage-install-target to add multiple missing packages.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-25 19:06:21 -05:00
Digimer
d677d19ca0 * Moved Database->check_condition_age to Alert.
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-23 01:28:21 -04:00
Digimer
aaa330dd41 * Finished (though known bugs exist) scan-lvm.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-09 02:13:12 -04:00
Digimer
33101f969a * Fixed several bugs related to tracking server boots, migrations and shut downs in the anvil database. The 'ocf:alteeve:server' now has (mostly?) safe integration with the Anvil! database. This was mostly done by updating Servers->boot_virsh(), ->shutdown_virsh() and ->migrate_server().
* Updates servers -> server_host_uuid to drop the 'NOT NULL' constraint.
* Created the new Get->server_uuid_from_name() that does what it says on the tin. Fixed a bug in ->host_uuid_from_name() where the host name was being returned instead of the UUID.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-07 02:40:31 -04:00
Digimer
be88be6d30 * Did a bunch of testing / bugfixes for scan-server.
* Updated the servers table to remove the 'not null' constraints on the server_start_after_server_uuid, server_pre_migration_file_uuid and server_post_migration_file_uuid columns.
* Updated ScanCore->agent_startup() to connect to the database(s) when there isn't a table list.
* Updated Server->migrate_virsh() to test for DB access before making DB calls (to allow ocf:alteeve:server to function even if all ScanCore DBs are offline).
* Updated ocf:alteeve:server to connect to the databases (though work without it), and changed '$FILE_NAME' to be 'ocf:alteeve:server' (to make logging more legible)
* Created the skeleton for 'scan-storage'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-05 23:47:56 -04:00
Digimer
262cbccb35 * Finished scan-server, though lots of testing needed.
* Renamed servers -> 'server_clean_stop' to 'server_user_stop' to make it clearer what the column represents.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-05 00:15:44 -04:00
Digimer
46f1a05789 * Got the code in scan-server to the point where it _should_ now gracefully and automatically detect changes to a server's definition originatin from the database (via Striker), directly editing the on-disk definition file, or editing via libvirt tools (like virt-manager). Still needs to be tested though.
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use      - tracking RAM used by a running server
** server_configured_ram  - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time       - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-02 02:13:34 -04:00
Digimer
1a1fa7ce88 * Created Cluster->get_anvil_uuid() that returns the 'anvil_uuid' of a given 'host_uuid'.
* Renamed the 'defitintions' table to 'server_definitions' to clarify the purpose, and made all the 'server' columns have then 'not null' constraint.
* Created Database->insert_or_update_servers(), ->get_servers(), ->insert_or_update_server_definitions() and ->get_server_definitions().
* Updated scancore, anvil-daemon, and scan agents to not run unless they're run with root privs.
* Got scan-server to update the servers / server_definition tables and the on-disk file when needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-28 00:20:13 -04:00
Digimer
4be943ebf3 * Finished (initial) testing of scan-hardware. The first M3 scan agent is done!
* Updated Alert->check_alert_sent() and the 'alert_sent' table to remove 'alert_name' as it really isn't helpful given record_locator.
* Created 'Database->purge_data' that takes an array of tables and, in reverse, purges them from the database(s). This also disables archiving and resync functions now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-16 01:19:21 -04:00
Digimer
0a1dc809a2 * Created the ScanCore.pm module with the first 'agent_startup' method which generalized scan agent start up.
* Updated Alert->register to take a hash reference for message variables to simplify when a caller plans to log and register an alert at the same time.
* Updated Convert->bytes_to_human_readable() to name the 'size' variable used internally for 'bytes' to actually be 'bytes' for better consistency.
* Created multiple new Database methods;
** ->check_condition_age() is meant to be used by scan agents to see how long a given condition has been in play (ie: how long ago power was lost to a UPS or a sensor became unreadable).
** ->insert_or_update_health() handles recording data to the new 'health' table, used for determining ideal hosts for servers between nodes.
** ->insert_or_update_power() handles recording data to the new 'power' table, used for determining how power events are handled.
** ->insert_or_update_temperature() handles recording temperature data to the new 'temperature' table, used to determine how thermal events are handled.
* Got a lot more done on the scan-hardware scan agent. Only part left now is post-scan health processing.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-15 01:18:36 -04:00
Digimer
925664762a * Created Database->check_for_schema() (not finished) that will check/add a schema for a scan agent.
* Renamed the scan-network skeleton scan agent to scan-hardware and started work on it based on the M2 version.
* Updated Database->get_recipients() to take the 'include_deleted' parameter, and changed the default behaviour to only return active records.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-08 01:07:23 -04:00
Digimer
911523dfce * Got a lot of work done in generating emails. Doesn't work yet, but the code to generate emails for recipients using their preferred language and alert level is done (though limited testing so far).
* Dropped support for supporting imperial measurements in generated emails.
* Created Database->get_alerts() to read in alert data and ->get_recipients() to get the list of alert recipients.
* SQL Schema changes;
** Added 'alert_processed' to 'alerts' to track what alerts have been processed.
** Changed 'recipient_new_level' to 'recipient_level' now that we're only using 'notifications' as a per-host override for user/hosts alert levels.
** Removed 'recipient_units' as we're no longer supporting non-metric values.
* Updated Alert->register() to take strings for the alert level (which gets translated to integers).
* Created Email->get_current_server() to returned the mail_server_uuid of the active mail server (if any). Created ->send_alerts() to process unprocessed alerts and send emails to recipients.
* Updated Words->parse_banged_string() to take the 'language' parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-05 01:18:42 -04:00
Digimer
49682a01d7 * Fixed a bug in Database->disconnect() where the database idenitification number wasn't being removed, so connecting again triggered the duplicate DB connection check.
* Fixed a bug in Tools->_set_defaults where the order the tables were sync'ed it caused primary/foreign keys would trigger DB errors when resync'ing in some cases.
* Created Database->log_connections to make it easier to log which databases are actively in use and other data about the connections.
* Fixed bugs in striker-manage-peers that (partly because of the above bugs) failed to connect to new peers properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 01:36:11 -04:00
Digimer
aad68b8ed0 * Got email reconfiguration when a mail queue is over ten minutes old (needs testing).
* Updated Storage->write_file to remove the cache for a written file if it exists.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-28 02:32:27 -04:00
Digimer
767148b538 * Updated Database->get_mail_servers() to clear old stored data, and to pull out the list of when a mail server was last used.
* Got email server configuration under way. A mail server can now be configured via Email->_configure_for_server(), but more work is needed on when to switch between configs.
* Fixed some logging of passwords that wasn't being checked to see if secure logging was enabled or not.
* Fixed a bug in Striker where the back arrow in email config sub-sections weren't going back to the main email menu.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-27 02:09:21 -04:00
Digimer
b2c7fd95fb * Renamed the ScanCore unit file to scancore.
* Added support to parsing location contraints to Cluster->parse_cib

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-24 12:54:31 -04:00
Digimer
14bf323627 * Fixed an issue with ocf:alteeve:server where, after a migration, the target host would invoke the RA as if it was trying to migrate, instead of verifying the server (resource) was OK post migration.
* Fixed a bug in Server->get_status() where the call to Storage->rsync's returned output checked for '!!errer!!' instead of '!!error!!'.
* Fixed a bug in Storage->rsync where, when no port was passed in, it would try to specify an empty port and fail.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-20 06:32:19 -04:00
Digimer
6eeb4e48c7 * Added 'timeout' and 'wfc-timeout' to drbd'd global-common.conf in DRBD->update_global_common().
* Fixed a minor bug in ocf:alteeve:server found in testing the RA.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-14 00:24:02 -04:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
ef208fd3fb * Finished the logic for adding stonith devices and levels to pacemaker! More testing is needed though, bugs expected, but it adds them.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-06 17:45:43 -04:00
Digimer
3c2f25a860 * Added 'fence_delay' fence agent to handle the corner cases where an IPMI BMC had crashed until a power cycle, and PDU fencing was effected, but failed to report as such.
* Updated Cluster->parse_cib() to take a CIB as a parameter.
* Fixed a bug in Database->get_hosts() where loading the host_ipmi value was filtered through Log->is_secure.
* Updated Striker->get_fence_data() to parse the switches to make it easier to map a fence agent's command line switches to STDIN arguments.
* Created System->parse_arguments() that converts a series of command line switches and their values into a hash. It's similar to Get->switches(), but works on any string.
* Continued work on anvil-join-anvil's fence configuration logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-30 00:23:47 -04:00
Digimer
62d0a2aa39 * Created Cluster->parse_cib() that parses pacemaker's CIB (cluster information base) XML. This also switches to the XML::LibXML, starting the replacement of XML::Simple. It's far from finished, but parses out basic node data and fence data.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-11 23:26:19 -04:00
Digimer
597d9413a5 * Created the skeleton Cluster.pm.
* Got anvil-join-anvil to the point where is initializes and starts the cluster.
* Deleted the old ssh key handling logic in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-10 00:49:30 -04:00
Digimer
76b6550ac6 * Created Database->get_ip_addresses() that pulls the IPs out and stores them in a hash that allows for easy referencing to associated interfaces and networks.
* Created Get->trusted_hosts() that finds the dashboards the host uses and, if the host is in an Anvil!, the peers in the same anvil.
* Created (but not finished yet) System->update_hosts() that will add and edit entries for all IPs to trusted hosts.
* Fixed a logging bug in Striker->load_manifest().
* Fixed a bug in System->call where, the the output from the shell call didn't end in a new-line, it would not parse the return code and lease the return code string appended to the shell output.
* Fixed a big in System->change_shell_user_password() where a new-line (\n) meant for the shell call wasn't escaped properly. There was also a duplicate 'return_code' variable preventing the actual return code from being read.
* Got more work done on anvil-join-anvil to update the hacluster password (when needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-03 18:11:56 -04:00
Digimer
726a4374d1 * Renamed the database table 'host_keys' to 'ssh_keys' to better represent what it stores.
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-10 18:26:50 -04:00
Digimer
7edfd0cb2d * Added gateway support to install manigest creation step 2.
* Finished sanity checking install manifest step2.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-05 23:11:47 -04:00
Digimer
d3bb350668 * Filtered out 'delay' and 'plug' options from Striker->get_fence_data()'s options parsing.
* Cleaned up striker's display of fence agent data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-02-01 19:05:39 -05:00
Digimer
b8c0577b54 * Fixed several issues with the fence configuration menu in striker.
* Added filters Striker->get_fence_data() for parameters. Manually change 'action' entries from 'string' to 'select' and use the data in the 'actions' element to populate it, with actions that don't make sense filtered out.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-22 15:21:35 -05:00
Digimer
818ef23634 * Moved the fences_unified_metadata file from /tmp, which apache can not read, to /var/www/html/.
* Fixed a bug (well, made a work-around for an issue without a known reproducer) where, on some occassion, a record will end up in the public table without being copied into the history schema. When this happens, the next resync would crash out because the resynd reads in the history table only. Now, when about to INSERT a record into the public schema during a resync, an explicit check is made to see if the record alread
y exists. If it does, the INSERT is instead redirected to the history schema.
* Cleaned up the fence agent metadata when displaying to a user, converting the shell codes to underline a string with square brackets instead. We also now replace newlines with <br /> tags. Lastly, to help fence_azure_arm's metadata description to display cleanly, a check is made to format the table correctly.
* Began work on the Striker menu for handling fence device management

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-20 23:41:01 -05:00
Digimer
f636e399d7 * Created tools/striker-parse-fence-agents which finds all the available fence agents on a system and gathers their metadata into a common XML file.
* Created Striker->get_fence_data() that reads/parses the unified fence metadata file created by tools/striker-parse-fence-agents.
* Created the new 'fences' database table and Database->insert_or_update_fences() to handle it.
* Added hosts -> host_ipmi that will, later, store information on how to access the host's IPMI interface, when available.
* Sketched out how the new Install Manifests are going to work.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-15 17:04:18 -05:00
Digimer
c3869a2ff6 * Started adding in front-end support for managing email servers and alert recipients. Added the new 'Email' module to (later) habdle all email-related tasks.
* Fixed a bug in Accounts->read_cookies where, when a user's hash had expired, the logged error message didn't show the user's name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-23 00:54:09 -05:00
Digimer
2f81275551 * Updated the kickstart template/script to now add the local striker to the installed system's dnf repo list.
* Fixed a bug in the tftp templates where the os_type was statically set to 'rhel8'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-20 00:54:07 -05:00
Digimer
b97f15b25e * Finished the sanity checks and user confirmation table when preparing to configure the network on a node or dr host.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-10 02:38:55 -05:00
Digimer
b9a0cc4d56 * Finished the initial tools/striker-initialize-host!
* Created Tools->refresh to reload anvil.conf in one call.
* Created Anvil::Tools::Network to hold network-related tasks.
** Created Network->is_remote() that tests to see if a string (containing a target) refers to the remote machine (versus a local machine). Updated all previous checks to use this new method.
** Moved Get->network_details() and Get->network() to the new Network module. Renamed Get->network() to Network->get_network().
** Made Network->get_ips() work locally and remotely.
** Created Network->find_matches() that compares two scanned machines IPs (via two previous calls to Network->get_ips())
* Created Database->manage_anvil_conf() that will add, update or remove a given database connection in a local or remote anvil.conf file.
* Fixed bugs in Storage->backup() where the bash calls were quite broken. I'm not sure how it ever worked before... x_x
* Updated anvil-daemon to not initialize a database unless it's running on dashboard. Also added a check at the startup of anvil-daemon where it will go into a loop waiting for a database to become available, re-reading anvil.conf each loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-22 23:36:59 -04:00
Digimer
a54e0d4e22 * Made a fair bit more progress on tools/striker-manage-install-target. It now registers a system with Red Hat and attaches appropriate subscriptions, updates the OS and (tries) to install anvil-{node,dr}.
* Fixed a bug in Remote->call where a shell call that ended in a newline would work, but throw an error and not get the return code.
* Created Database->get_job_details() which takes a job_uuid and returns the job details, if found.
* Fixed a bug in Jobs->update_progress() where 'clear' wasn't removing the old job_progress data.
* Added the parameters 'no_files' to skip stat'ing/recording non-directories, and 'search_for' which will set the parent directory in 'scan::searched' and stop scanning if found. This allows this method to act as a directory tree scanner and as a search engine.
* Created Striker->get_local_repo() that builds a repo file body suitable for adding to peers, nodes and DR hosts.
* Fixed bugs in the Striker WebUI related to initializing a target node / DR host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-18 18:48:09 -04:00
Digimer
bc341809ca * Finished (for now) ocf:alteeve:server! It can boot, migrate and stop a server cleanly. It still checks to see if DRBD needs to be started and does so when needed, but it won't stop it anymore.
* Fixed a couple typos in tools/anvil-check-memory that prevented it from running.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-04 18:58:43 -04:00
Digimer
8a2c86d088 * Renamed striker-configure-host (back) to anvil-configure-host, and started updating it to work on any machine type.
* Created tools/anvil-check-memory to report how much RAM is used by a given program.
* Added documentation for some previously undocumented methods.
* Updated Database->archive_database() to take the 'tables' parameter.
* Updated Storage->scan_directory() to record a directory's mode and type, even when recursive isn't used.
* Finished System->check_memory().
* Updated ocf:alteeve:server to now NOT stop a DRBD resource unless 'stop_drbd_resources'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-03 14:07:39 -04:00
Digimer
c0dd34334e * Fixed another bug in making ocf:alteeve:server work in pacemaker.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-21 01:14:10 -04:00