Commit Graph

178 Commits

Author SHA1 Message Date
Digimer
73267a8ea9 * WIP - Slowly working on anvil-manage-server
* Updated the scancore interval to 60 seconds.
* Updated Database->insert_or_update_health() so that 'delete' can find the health_uuid.
* Updated Convert->time() to return silently when passed '-1'.
* Fixed a bug scan-hardware to call Convert->round(). Also fixed it so it didn't set health scores of 0 for mismatch RAM when the RAM was not mismatched.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-02 14:08:55 -04:00
Digimer
4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped.
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-31 13:34:49 -04:00
Digimer
6abe06f125 The theme of these commits is improving DB responsiveness.
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-29 23:34:22 -04:00
Digimer
bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host.
* Created Words->center_text that takes a string (or string key) and centers it to a given string length, padding white spaces on either side of the string as needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-26 20:24:05 -04:00
Digimer
42ffc200bc * Updated remainder pointers to the old repos to the new repos. Added support for the new alteeve-repo-setup.
* Removed the checks for resync that limited resyncs on jobs and variables tables. That approach to minimize unnecessary resyncshas proven faulty, will find another way later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 14:34:15 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
ad4a1ecc78 * Increaded the scancore agent run timeout to 60 seconds.
* Updated anvil-safe-start to start DRBD resources when the peer's DRBD resourcs is 'Connecting',
* Updated fence_pacemaker to more intelligently check the list of host names related to an IP address when looking for the peer host name

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-15 00:12:43 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Digimer
27259d1d53 * Finished anvil-rename-server!
* Created Storage->delete_file() that, well, deletes files (locally or on a peer).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-22 13:29:50 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added.
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 00:26:06 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
70dc0598f2 * Created Storage->manage_lvm_conf() that checks / updates lvm.conf to add a filter to avoid seeing DRBD devices as LVM components. This is now called from striker-initialize-host and scan-drbd.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-31 23:59:19 -04:00
Digimer
9fa24750d6 * Fixed a bug in Convert-round() where the requested number of digits after the decimal place was coming back one too long. Also added logging that should have been there for a while now.
* Finished scan-filesystems!
* Realized that filesystem UUIDs are not always actual UUIDs, and so created an additonal column in filesystems -> filesystem_internal_uuid and created a normal filesystem_uuid table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-26 23:37:01 -04:00
Digimer
296556328b * Fixed a bug in Convert->bytes_to_human_readable() to handle being passed in bytes (with the size units of 'b' ot 'bytes').
* Fixed a bug in Database->_find_behind_databases() _find_behind_databases() where the logic to figure out which column was the host_uuid reference was too liberal, causing the wrong column to be selected in some cases. Also added a check to not look for host_uuid columns on specific tables where a match would be made, but the column is allowed to be null (like server_host_uuid that indicates the host of the server).
* Started work on the scan-filesystems scan agent, which is needed to record the data that the file system UI endpoints will need.
* Removed the 'not null' constraint from 'servers' -> 'server_host_uuid'.
* Fixed a bug in 'anvil-provision-server' where the driver ISO being 'none' caused the provision script to use the driver ISO switch.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-18 00:37:17 -04:00
Digimer
3733220b50 * Updated Log->entry() to prefix log lines with the short 'job-uuid', when the log entry is coming from a program running as a job. This is meant to make it easier to break up what log lines belong to what jobs, if multiple jobs are running at the same time (ie: when initializing multiple nodes / dr hosts in parallel).
* Updated Remote->call() to return ('!!error!!', '!!error!!', 9999) when an error hits. Made Remote->test_access() explicitely check for '1' to be returned in order to confirm access, fixing a bug where bad target value caused false positives. Updated ->_check_known_hosts_for_target() to no longer explicitely check for 'ssh-rsa' so that machine keys using different cyphers are detected as being in known_hosts properly.
* Updated striker-auto-initialize-all to initialize nodes and DR hosts networks before trying to form them into an Anvil!. Fixed several other bugs as well. More testing is needed, but it works now.
* Updated striker-initialize-host to check for the alteeve repo and, it not found, check for accress to alteeve.com. If access, it will install our repo now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-10 02:26:09 -05:00
Digimer
15fd0e5ce8 * Updated anvil-daemon (and Database->insert_or_update_jobs) to now recognize jobs with the job_status of 'scancore_startup' to run only when ScanCore starts.
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-04 01:41:33 -05:00
Digimer
1b65f53faa * Remove host-health from the 'hosts' table as it wasn't needed, given the 'health' table. Bumped the SQL version to 0.0.2
* Updated Get->os_type() to use 'cat' instead of Storage->read_file() because 'rsync' may not be available when it is called during striker-initialize-host calls.
* Updated Database methods to skip 'oui' and 'state' during resync.
* Updatedb striker-initialize-host to detect when it's initializing a CentOS Stream Node / DR Host and enable the HA repo.
* Created the tools/striker-auto-initialize-all tool, which is very much incomplete, that will allow for the rapid creation of a full Anvil! from freshly installed machines autonomously.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-22 19:22:47 -05:00
Digimer
2937afad26 * Got UEFI booting working up to the grub menu, though files formerly provided by anvil-striker-extra still need to be added to the main anvil-striker to work properly.
* Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-10 23:49:11 -05:00
Digimer
8d0f873912 * Updated scan-storcli to check if a MegaRAID controlled exists and neither storcli64 or perccli64 exist. If a controller is found but no RPM is installed, it checks to see if the host is Dell and then decides to try and install perccli or storcli.
* Reworked scan-ipimitool so that on nodes and dr hosts, it only scans itself. On strikers, it scans all hosts found in active Anvil! systems with a host_ipmi entry. `
* For all agents, reduced log verbosity to not push too much noise into anvil.log while scancore is running in the background.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:34:51 -05:00
Digimer
218934bec8 * Fixed a bug with the path to anvil-provision-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 19:06:22 -05:00
Digimer
413a4f73c2 * Updated Tools->_anvil_version() and Get->anvil_version() to now pick up a SchemaVersion from anvil.sql. This will change only when the schema changes and is used when Database->connect() is checking compatibility with other anvil database hosts. This will make it only break connection when there is a reason to do so. The anvil_version still remains as an informational version that will help when supporting users later.
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 14:03:13 -05:00
Digimer
89dec8e1f9 * Finished anvil-delete-server! (More testing needed though)
* Fixed a bug in Cluster->shutdown_server() where the wrong variable was being evaluated when checking the server state.
* Created DRBD->delete_resource() that deletes a resource's backing device and configuration. Note that this wipes the DRBD MD and and FS signatures before removing the LV. Updated DRBD->gather_data() to record the backing devices for volumes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-26 01:45:17 -05:00
Digimer
549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed).
* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-25 01:00:55 -05:00
Digimer
d9d347ce63 * Updated .spec for the new source location.
* Created a log disable flag to avoid deep recursion when logging at level 3.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-22 00:37:30 -05:00
Digimer
e0ceb5c65f * Provisioning servers is done!! This commit handles the virt-install call and adding the server to the database.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-17 02:54:41 -05:00
Digimer
8127c70237 * Support for OS type selection was missing in tools/anvil-provision-server, this commit adds support for this, as well as some infrastructure to support it. This includes a new 'sys::servers::os_short_list' variable that contains a CSV of main OSes to show in a "short" list (the full list is massive). This variable can be set by the user in anvil.conf. Also added job progress calls that were missing through the storage config.
* Created a new tools/striker-parse-os-list tool that parses 'osinfo-query os' and prints out entries for words.xml for any new OSes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-16 03:30:51 -05:00
Digimer
056f8edf48 * Moved the scan-drbd function 'gather_drbd_date' and moved it into DRBD->gather_date().
* Finished DRBD->get_next_resource() that returns the next available minor and the next free TCP port (with two free ports available after it).
* Created Storage->get_storage_group_details() that pulls together the LVM, storage group members and storage groups into one block of data.
* Made more progress on tools/anvil-provision-server. It now gets up to the point of creating LVs, creating DRBD resource files, loading them, creating metadata and up'ing the resource. It doesn't yet (successfully) force a new resource to primary.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-15 03:10:16 -05:00
Digimer
a7f0676a0f * Got the 'anvil-provision-server' script to the point where it actually saves the new server job.
* Created the new method Cluster->get_primary_host_uuid() that returns the 'host_uuid' of the primary node in the given cluster. This is useful for external programs to figure out which node is primary. Example is provisioning a new server being assigned to the active node. Also created ->is_primary() that is a similar test to see if the active node is the primary node or not.
* Updated Cluster->parse_cib() and ->parse_crm_mon() to work on remote hosts.
* Updated Database->get_hosts() to store the short host names.
* Created Get->host_from_ip_address() that translates an IP address to a host_uuid and host_name, if it's an IP assigned currently to a known host.
* Created Network->find_target_ip() that simplifies finding which IP address to use when the caller wants to connect to a target host.
* Reworked the anvil-join-anvil to parse fence_arguments in a way that handles passwords with spaces in them.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-12 21:29:01 -05:00
Digimer
7d3c4371c7 * Renamed tools/striker-sync-shared to tools/anvil-sync-shared, as it's now designed to run on all machines. Got it to the point where it can be run on Anvil! members to pull down freshly uploaded files. It does so, when two or more strikers are available with the target file, load balancing such that one node downloads from one striker while another node downloads from the other striker. If there is three nodes, and if there is a DR host, the DR host will download from the third striker. If there are 1 or 2 strikers, the DR host will wait to download after both nodes have finished downloading.
* Cleaned up upload.pl now that it isn't responsible for loading the file details into the database. It only sets a job for the local Striker to process the file and move it into /mnt/shared/files, copy it to peer dashboards, then load jobs for Anvil! members to sync the new file.
* Created Database->get_files() and ->get_file_locations() to load the respective data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-02 02:23:33 -05:00
Digimer
1a36f37065 * Got file uploads working!
* Got tools/striker-sync-shared to pick up 'upload::move_incoming' jobs, move the uploaded file to /mnt/shared/files/, copies it to peer dashboards, adds it to the 'files' table and adds it to 'file_locations'.
* Reworked the 'file_locations' table to now map files to Anvil! systems, not hosts. It simply tracks if a given file should be on Anvil! members or not. Later, striker-sync-shared on the Anvil! members will pull the file down.
* Updated Storage->get_file_stats() to record the file's mimetype.
* Fixed up a few issues in cgi-bin/upload.pl.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-01 04:09:17 -05:00
Digimer
8f823d3b86 * Switched out the static list of core table to use the array generated by Database->get_tables_from_schema().
* Fixed bugs around creating and filtering storage groups.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-28 21:59:32 -05:00
Digimer
4d5ec72026 * Started work on the scan-drbd scan agent. Got it to the point that it is gathering needed data.
* Fixed a bug in Database->check_agent_data() where the list of tables wasn't passed in, and thus the table list wasn't then passed on to Database->_find_behind_databases().
* Started work on a new method called Storage->parse_lsblk().

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-30 04:18:16 -05:00
Digimer
cda51e562d * Finished porting scan-hpacucli, the last M2 scan agent!
* Updated Database->insert_or_update_temperature() to accespt a 'delete' parameter (which, surprise, deletes a record).
* Updated (but not yet tested) the RPM .spec to require that 'core' and the other three packages are required to be the same version.
* Updated scan-ipmitool and scan-storcli scan agents to now delete temperature data belonging to lost sensors.
* Fixed tools/striker-manage-install-target to add multiple missing packages.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-25 19:06:21 -05:00
Digimer
1c00060d6e * Finished porting scan-storcli! This was the largest scan agent to migrate to M3.
* Updated Alert->register() to take message variables using the 'variables' parameter.
* Added a 'cache' parameter to Database->insert_or_update_health() and ->insert_or_update_temperature(). When set, the SQL UPDATE/INSERT calls and pushed into the array reference set in 'cache'. This is to allow performance improvements when processing a large amount of sensor/device data.
* Updated Log->variables() to take a 'prefix' parameter that, when set, will prefix the string to each variable line.
* Updated scan-ipmitool to use Database->insert_or_update_health() and ->insert_or_update_temperature().

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-23 11:18:00 -05:00
Digimer
d677d19ca0 * Moved Database->check_condition_age to Alert.
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-23 01:28:21 -04:00
Digimer
2f4a06f2e0 * Updated System->call() to take the 'timeout' parameter which, when set, prepends the call with 'timeout X <shell_call>' to make it easier to deal with calls that could potentially hang.
* Renamed scan-storage to scan-lvm as we only really care about LVM data in this agent. A dedicated scan-drbd will be created later. Got the agent to parse the pvs/vgs/lvs data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-08 01:38:41 -04:00
Digimer
e240a32a19 * Created Cluster->parse_crm_mon and updated Cluster->parse_cib() to determine what state a server is in and which host has a server.
* Added support in anvil.conf to disable scan agents with 'scancore::<agent_name>::disable', and added handling this to agents. Also allowed for '--force' to override this setting.
* Updated ScanCore->agent_startup() to allow for empty scan agent table lists.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-24 18:16:05 -04:00
Digimer
0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names).
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-20 00:27:36 -04:00
Digimer
0a1dc809a2 * Created the ScanCore.pm module with the first 'agent_startup' method which generalized scan agent start up.
* Updated Alert->register to take a hash reference for message variables to simplify when a caller plans to log and register an alert at the same time.
* Updated Convert->bytes_to_human_readable() to name the 'size' variable used internally for 'bytes' to actually be 'bytes' for better consistency.
* Created multiple new Database methods;
** ->check_condition_age() is meant to be used by scan agents to see how long a given condition has been in play (ie: how long ago power was lost to a UPS or a sensor became unreadable).
** ->insert_or_update_health() handles recording data to the new 'health' table, used for determining ideal hosts for servers between nodes.
** ->insert_or_update_power() handles recording data to the new 'power' table, used for determining how power events are handled.
** ->insert_or_update_temperature() handles recording temperature data to the new 'temperature' table, used to determine how thermal events are handled.
* Got a lot more done on the scan-hardware scan agent. Only part left now is post-scan health processing.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-15 01:18:36 -04:00
Digimer
fe7cdb18fb * Updated all methods to add (or fix) logging the method entry.
* More work done on Email->send_email() to, well, actually send email (which it isn't doing yet, but it's close).
* Updated Words->key() to include the bad key name when no entry for the requested key exists in the words.xml file.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-06 01:52:03 -04:00
Digimer
82acb4e104 * Fixed a resync bug where bridges needed to sync before bonds
* Re-enabled user-selected BCN subnet ranges (needs more testing).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 22:52:38 -04:00
Digimer
49682a01d7 * Fixed a bug in Database->disconnect() where the database idenitification number wasn't being removed, so connecting again triggered the duplicate DB connection check.
* Fixed a bug in Tools->_set_defaults where the order the tables were sync'ed it caused primary/foreign keys would trigger DB errors when resync'ing in some cases.
* Created Database->log_connections to make it easier to log which databases are actively in use and other data about the connections.
* Fixed bugs in striker-manage-peers that (partly because of the above bugs) failed to connect to new peers properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 01:36:11 -04:00
Digimer
aad68b8ed0 * Got email reconfiguration when a mail queue is over ten minutes old (needs testing).
* Updated Storage->write_file to remove the cache for a written file if it exists.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-28 02:32:27 -04:00
Digimer
767148b538 * Updated Database->get_mail_servers() to clear old stored data, and to pull out the list of when a mail server was last used.
* Got email server configuration under way. A mail server can now be configured via Email->_configure_for_server(), but more work is needed on when to switch between configs.
* Fixed some logging of passwords that wasn't being checked to see if secure logging was enabled or not.
* Fixed a bug in Striker where the back arrow in email config sub-sections weren't going back to the main email menu.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-27 02:09:21 -04:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
d647014ad1 * Created (finished but not yet tested) DRBD->update_global_common() to update DRBD's global_common.conf file.
* Cleaned up a lot of log levels.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-07 23:12:11 -04:00
Digimer
c27cc7507f * Renamed striker-parse-fence-agents to anvil-parse-fence-agents and changed anvil-daemon to run it on all machines.
* Cleaned up a lot of logging.
* Updated Cluster->parse_cib() to track if a stonith device has 'delay' set.
* Got a lot more work done on anvil-join-anvil's stonith processing, but it still isn't complete. Updated it to change shell user passwords as well.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-06 01:20:53 -04:00