Commit Graph

84 Commits

Author SHA1 Message Date
Digimer
4be943ebf3 * Finished (initial) testing of scan-hardware. The first M3 scan agent is done!
* Updated Alert->check_alert_sent() and the 'alert_sent' table to remove 'alert_name' as it really isn't helpful given record_locator.
* Created 'Database->purge_data' that takes an array of tables and, in reverse, purges them from the database(s). This also disables archiving and resync functions now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-16 01:19:21 -04:00
Digimer
0a1dc809a2 * Created the ScanCore.pm module with the first 'agent_startup' method which generalized scan agent start up.
* Updated Alert->register to take a hash reference for message variables to simplify when a caller plans to log and register an alert at the same time.
* Updated Convert->bytes_to_human_readable() to name the 'size' variable used internally for 'bytes' to actually be 'bytes' for better consistency.
* Created multiple new Database methods;
** ->check_condition_age() is meant to be used by scan agents to see how long a given condition has been in play (ie: how long ago power was lost to a UPS or a sensor became unreadable).
** ->insert_or_update_health() handles recording data to the new 'health' table, used for determining ideal hosts for servers between nodes.
** ->insert_or_update_power() handles recording data to the new 'power' table, used for determining how power events are handled.
** ->insert_or_update_temperature() handles recording temperature data to the new 'temperature' table, used to determine how thermal events are handled.
* Got a lot more done on the scan-hardware scan agent. Only part left now is post-scan health processing.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-15 01:18:36 -04:00
Digimer
925664762a * Created Database->check_for_schema() (not finished) that will check/add a schema for a scan agent.
* Renamed the scan-network skeleton scan agent to scan-hardware and started work on it based on the M2 version.
* Updated Database->get_recipients() to take the 'include_deleted' parameter, and changed the default behaviour to only return active records.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-08 01:07:23 -04:00
Digimer
911523dfce * Got a lot of work done in generating emails. Doesn't work yet, but the code to generate emails for recipients using their preferred language and alert level is done (though limited testing so far).
* Dropped support for supporting imperial measurements in generated emails.
* Created Database->get_alerts() to read in alert data and ->get_recipients() to get the list of alert recipients.
* SQL Schema changes;
** Added 'alert_processed' to 'alerts' to track what alerts have been processed.
** Changed 'recipient_new_level' to 'recipient_level' now that we're only using 'notifications' as a per-host override for user/hosts alert levels.
** Removed 'recipient_units' as we're no longer supporting non-metric values.
* Updated Alert->register() to take strings for the alert level (which gets translated to integers).
* Created Email->get_current_server() to returned the mail_server_uuid of the active mail server (if any). Created ->send_alerts() to process unprocessed alerts and send emails to recipients.
* Updated Words->parse_banged_string() to take the 'language' parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-05 01:18:42 -04:00
Digimer
49682a01d7 * Fixed a bug in Database->disconnect() where the database idenitification number wasn't being removed, so connecting again triggered the duplicate DB connection check.
* Fixed a bug in Tools->_set_defaults where the order the tables were sync'ed it caused primary/foreign keys would trigger DB errors when resync'ing in some cases.
* Created Database->log_connections to make it easier to log which databases are actively in use and other data about the connections.
* Fixed bugs in striker-manage-peers that (partly because of the above bugs) failed to connect to new peers properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 01:36:11 -04:00
Digimer
aad68b8ed0 * Got email reconfiguration when a mail queue is over ten minutes old (needs testing).
* Updated Storage->write_file to remove the cache for a written file if it exists.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-28 02:32:27 -04:00
Digimer
767148b538 * Updated Database->get_mail_servers() to clear old stored data, and to pull out the list of when a mail server was last used.
* Got email server configuration under way. A mail server can now be configured via Email->_configure_for_server(), but more work is needed on when to switch between configs.
* Fixed some logging of passwords that wasn't being checked to see if secure logging was enabled or not.
* Fixed a bug in Striker where the back arrow in email config sub-sections weren't going back to the main email menu.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-27 02:09:21 -04:00
Digimer
b2c7fd95fb * Renamed the ScanCore unit file to scancore.
* Added support to parsing location contraints to Cluster->parse_cib

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-24 12:54:31 -04:00
Digimer
14bf323627 * Fixed an issue with ocf:alteeve:server where, after a migration, the target host would invoke the RA as if it was trying to migrate, instead of verifying the server (resource) was OK post migration.
* Fixed a bug in Server->get_status() where the call to Storage->rsync's returned output checked for '!!errer!!' instead of '!!error!!'.
* Fixed a bug in Storage->rsync where, when no port was passed in, it would try to specify an empty port and fail.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-20 06:32:19 -04:00
Digimer
6eeb4e48c7 * Added 'timeout' and 'wfc-timeout' to drbd'd global-common.conf in DRBD->update_global_common().
* Fixed a minor bug in ocf:alteeve:server found in testing the RA.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-14 00:24:02 -04:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
ef208fd3fb * Finished the logic for adding stonith devices and levels to pacemaker! More testing is needed though, bugs expected, but it adds them.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-06 17:45:43 -04:00
Digimer
3c2f25a860 * Added 'fence_delay' fence agent to handle the corner cases where an IPMI BMC had crashed until a power cycle, and PDU fencing was effected, but failed to report as such.
* Updated Cluster->parse_cib() to take a CIB as a parameter.
* Fixed a bug in Database->get_hosts() where loading the host_ipmi value was filtered through Log->is_secure.
* Updated Striker->get_fence_data() to parse the switches to make it easier to map a fence agent's command line switches to STDIN arguments.
* Created System->parse_arguments() that converts a series of command line switches and their values into a hash. It's similar to Get->switches(), but works on any string.
* Continued work on anvil-join-anvil's fence configuration logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-30 00:23:47 -04:00
Digimer
62d0a2aa39 * Created Cluster->parse_cib() that parses pacemaker's CIB (cluster information base) XML. This also switches to the XML::LibXML, starting the replacement of XML::Simple. It's far from finished, but parses out basic node data and fence data.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-11 23:26:19 -04:00
Digimer
597d9413a5 * Created the skeleton Cluster.pm.
* Got anvil-join-anvil to the point where is initializes and starts the cluster.
* Deleted the old ssh key handling logic in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-10 00:49:30 -04:00
Digimer
76b6550ac6 * Created Database->get_ip_addresses() that pulls the IPs out and stores them in a hash that allows for easy referencing to associated interfaces and networks.
* Created Get->trusted_hosts() that finds the dashboards the host uses and, if the host is in an Anvil!, the peers in the same anvil.
* Created (but not finished yet) System->update_hosts() that will add and edit entries for all IPs to trusted hosts.
* Fixed a logging bug in Striker->load_manifest().
* Fixed a bug in System->call where, the the output from the shell call didn't end in a new-line, it would not parse the return code and lease the return code string appended to the shell output.
* Fixed a big in System->change_shell_user_password() where a new-line (\n) meant for the shell call wasn't escaped properly. There was also a duplicate 'return_code' variable preventing the actual return code from being read.
* Got more work done on anvil-join-anvil to update the hacluster password (when needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-03 18:11:56 -04:00
Digimer
726a4374d1 * Renamed the database table 'host_keys' to 'ssh_keys' to better represent what it stores.
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-10 18:26:50 -04:00
Digimer
7edfd0cb2d * Added gateway support to install manigest creation step 2.
* Finished sanity checking install manifest step2.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-05 23:11:47 -04:00
Digimer
d3bb350668 * Filtered out 'delay' and 'plug' options from Striker->get_fence_data()'s options parsing.
* Cleaned up striker's display of fence agent data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-02-01 19:05:39 -05:00
Digimer
b8c0577b54 * Fixed several issues with the fence configuration menu in striker.
* Added filters Striker->get_fence_data() for parameters. Manually change 'action' entries from 'string' to 'select' and use the data in the 'actions' element to populate it, with actions that don't make sense filtered out.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-22 15:21:35 -05:00
Digimer
818ef23634 * Moved the fences_unified_metadata file from /tmp, which apache can not read, to /var/www/html/.
* Fixed a bug (well, made a work-around for an issue without a known reproducer) where, on some occassion, a record will end up in the public table without being copied into the history schema. When this happens, the next resync would crash out because the resynd reads in the history table only. Now, when about to INSERT a record into the public schema during a resync, an explicit check is made to see if the record alread
y exists. If it does, the INSERT is instead redirected to the history schema.
* Cleaned up the fence agent metadata when displaying to a user, converting the shell codes to underline a string with square brackets instead. We also now replace newlines with <br /> tags. Lastly, to help fence_azure_arm's metadata description to display cleanly, a check is made to format the table correctly.
* Began work on the Striker menu for handling fence device management

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-20 23:41:01 -05:00
Digimer
f636e399d7 * Created tools/striker-parse-fence-agents which finds all the available fence agents on a system and gathers their metadata into a common XML file.
* Created Striker->get_fence_data() that reads/parses the unified fence metadata file created by tools/striker-parse-fence-agents.
* Created the new 'fences' database table and Database->insert_or_update_fences() to handle it.
* Added hosts -> host_ipmi that will, later, store information on how to access the host's IPMI interface, when available.
* Sketched out how the new Install Manifests are going to work.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-15 17:04:18 -05:00
Digimer
c3869a2ff6 * Started adding in front-end support for managing email servers and alert recipients. Added the new 'Email' module to (later) habdle all email-related tasks.
* Fixed a bug in Accounts->read_cookies where, when a user's hash had expired, the logged error message didn't show the user's name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-23 00:54:09 -05:00
Digimer
2f81275551 * Updated the kickstart template/script to now add the local striker to the installed system's dnf repo list.
* Fixed a bug in the tftp templates where the os_type was statically set to 'rhel8'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-20 00:54:07 -05:00
Digimer
b97f15b25e * Finished the sanity checks and user confirmation table when preparing to configure the network on a node or dr host.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-10 02:38:55 -05:00
Digimer
b9a0cc4d56 * Finished the initial tools/striker-initialize-host!
* Created Tools->refresh to reload anvil.conf in one call.
* Created Anvil::Tools::Network to hold network-related tasks.
** Created Network->is_remote() that tests to see if a string (containing a target) refers to the remote machine (versus a local machine). Updated all previous checks to use this new method.
** Moved Get->network_details() and Get->network() to the new Network module. Renamed Get->network() to Network->get_network().
** Made Network->get_ips() work locally and remotely.
** Created Network->find_matches() that compares two scanned machines IPs (via two previous calls to Network->get_ips())
* Created Database->manage_anvil_conf() that will add, update or remove a given database connection in a local or remote anvil.conf file.
* Fixed bugs in Storage->backup() where the bash calls were quite broken. I'm not sure how it ever worked before... x_x
* Updated anvil-daemon to not initialize a database unless it's running on dashboard. Also added a check at the startup of anvil-daemon where it will go into a loop waiting for a database to become available, re-reading anvil.conf each loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-22 23:36:59 -04:00
Digimer
a54e0d4e22 * Made a fair bit more progress on tools/striker-manage-install-target. It now registers a system with Red Hat and attaches appropriate subscriptions, updates the OS and (tries) to install anvil-{node,dr}.
* Fixed a bug in Remote->call where a shell call that ended in a newline would work, but throw an error and not get the return code.
* Created Database->get_job_details() which takes a job_uuid and returns the job details, if found.
* Fixed a bug in Jobs->update_progress() where 'clear' wasn't removing the old job_progress data.
* Added the parameters 'no_files' to skip stat'ing/recording non-directories, and 'search_for' which will set the parent directory in 'scan::searched' and stop scanning if found. This allows this method to act as a directory tree scanner and as a search engine.
* Created Striker->get_local_repo() that builds a repo file body suitable for adding to peers, nodes and DR hosts.
* Fixed bugs in the Striker WebUI related to initializing a target node / DR host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-18 18:48:09 -04:00
Digimer
bc341809ca * Finished (for now) ocf:alteeve:server! It can boot, migrate and stop a server cleanly. It still checks to see if DRBD needs to be started and does so when needed, but it won't stop it anymore.
* Fixed a couple typos in tools/anvil-check-memory that prevented it from running.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-04 18:58:43 -04:00
Digimer
8a2c86d088 * Renamed striker-configure-host (back) to anvil-configure-host, and started updating it to work on any machine type.
* Created tools/anvil-check-memory to report how much RAM is used by a given program.
* Added documentation for some previously undocumented methods.
* Updated Database->archive_database() to take the 'tables' parameter.
* Updated Storage->scan_directory() to record a directory's mode and type, even when recursive isn't used.
* Finished System->check_memory().
* Updated ocf:alteeve:server to now NOT stop a DRBD resource unless 'stop_drbd_resources'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-03 14:07:39 -04:00
Digimer
c0dd34334e * Fixed another bug in making ocf:alteeve:server work in pacemaker.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-21 01:14:10 -04:00
Digimer
ed2e83a1a4 * Fixed a few more bugs in 'ocf:alteeve:anvil', but it's still failing when invoked by pacemaker.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-20 23:46:59 -04:00
Digimer
113a44ecc6 * Got 'migrate_to' working in ocf:alteeve:server. 'migrate_from' still needs work.
* Created DRBD->allow_two_primaries() and ->reload_defaults() that enables (and resets/disables) dual-primary operation (allow-two-primaries=yes), used to enable live migration.
* Created Remote->test_access() that simply verifies that a remote target can be accessed (as a given user).
* Created Server->migrate() that actually migrates a server. It can push already, and pull will be added next.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-16 01:41:47 -04:00
Digimer
0f873c45b5 * Created the new Anvil::Tools::DRBD moduke to hold all DRBD related stuff. Started working of ->get_status, still very much a work in progress.
* Started working on ocf:alteeve:server to make it smarter (and more patient) when bringing a DRBD resource up so we don't get false failures when we hit a race.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-23 02:37:14 -04:00
Digimer
f294d48777 * Updated notes with working pcs config example (enable, disable and migrate all work!)
* Minor fix to ocf/alteeve/server to not print anything except the XML meta information when requested

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-20 01:36:02 -04:00
Digimer
9c0f6b8f79 * Added automatic 'echo return_code:$?' to System->call and Remote->call which is parsed out and returned automatically on all calls.
* Started porting ocf:alteeve:server to use the Anvil::Tools module and updating it for RHEL 8.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-13 04:16:03 -04:00
Digimer
302a8aade9 * Fixed some bugs in tools/anvil-manage-firewall, it's working again (though new features are pending).
* Moved firewall.txt out of the templates directory and into the tools directory so that it is accessible on nodes and DR hosts (which don't get the apache files).

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-09 02:01:05 -04:00
Digimer
e55594f58f * Notes updated with working network config for RHEL 8 proper. Two notes; Creating a BCN bridge by default, and switch the DR third octet to 12 (13 for IPMI) and fourth octet to sequence number.
* Fixed a bug in System->get_ips() where DHCP-assigned IPs were not being parsed properly to get the default gateway.
* Added the alteeve-el8-repo to the kickstart files install package list.
* Updated anvil-daemon to sleep 2 seconds between loops, instead of 1. Added a check to 'check_firewall' to not run until after the system has been configured.
* Quieted a lot of logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-05 01:36:21 -04:00
Digimer
188ecdbbd7 * Improved handling of missing RPMs when downloading RPMs for tools/striker-manage-install-target's repo.
* Updated package list to fix changed dependencies from RHEL 8 beta to final.
* Changed anvil_daemon to only check DHCP once per minute instead of every loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-06-28 00:37:08 -04:00
Digimer
ae095ab85a * Updated the RPM build order and list for RHEL 8 final.
* Replaced 'screen' with 'tmux' in the spec file.
* Fixed a couple typos in the SQL schema that prevented it from loading.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-06-24 17:54:52 -04:00
Digimer
d92a6225c7 * a small note for myself.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-05-23 21:05:08 -04:00
Digimer
559dbafb39 * Finished Convert->time (needs testing).
* Started reworking Remote->call to use Net::OpenSSH instead of Net::SSH2.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-03-22 00:57:53 -04:00
Digimer
d3b2f2fd35 * Started Storage->scan_directory() that searches a directory (optionally recursively), recording details about files it finds, including mimetypes./
* Added perl-File-MimeInfo to anvil-core (and built a pile of dependencies for the repos).

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-12 04:12:45 -05:00
Digimer
3e2a1faf06 * Updated System->get_os_type() to add comments for RHEL8 and remove Fedora support.
* Updated the pxe.txt template to use RHEL8 and remove Fedora support.
* Updated striker-manage-install-target to support RHEL8 and remove Fedora support.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-25 15:04:57 -05:00
Digimer
b7b4e79e95 * Updated System->check_firewall to use firewall-cmd and the contents of iptables-save is not available/reliable under RHEL8.
* Disabled deletion of unneeded zones.
* Updated the default BCN/SN IPs generated in striker to follow the new schema.

Firewall work continues.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-23 19:54:01 -05:00
Digimer
d2c812ee03 This commit starts the move to RHEL8 from Fedora 28.
* The 'notes' file has a lot of RHEL8 migration notes added, including RPM build orders.
* The anvil.spec file has switched the source from 'master.tar.gz' to 'anvil-3.0b.tar.gz' and moved the source to our webserver. Updated the dependencies as well.
* Updated anvil.sql to add the 'anvils' table and fixed some SQL schema problems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-17 16:02:57 -05:00
Digimer
02c4fe1fa1 * Updated all perl module modes to remove the executable bit.
* Updated anvil.sql to add the new tables needed for alert mail delivery.
* Update anvil.sql and Database->initialize to now default the user to 'admin' and swap that out if needed, instead of using the #!variable!user!#' replacement variable.
* Started updating anvil.spec for EL8.
* Added support for 'striker::repo::extra-packages' which users can use to add additional packages to the Striker repositories.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-05 18:57:44 -07:00
Digimer
946fce018a * Renamed the 'ScanCore' executable to just 'scancore', moved it into the standard 'tools' directory and changed the agents directory to '/usr/sbin/scancore-agents'.
* Got scancore scanning the agents directory, and properly holding on startup until at least one database is available (instead of exiting), and holding on startup until the local system is configured.
* Created the skeleton of the first scan agent; scan-network.
* Fixed a bug in Storage->check_md5sums() where dynamically loaded modules, loaded after the initial md5sum calcs, would cause the calling daemon to exit (possibly on every invocation).
* Created the scancore.README that will eventually be the main scan agent guide / API document.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-28 18:17:59 -04:00
Digimer
f5ae90c941 * Started work on M3 ScanCore!
* Started expanding Alert->register_alert() to actually implement it.
* Improved handling errors in Words->key().
* Started work on Striker's "Anvil!" menu section. Also cleaned up the power handling.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-18 15:20:49 -08:00
Digimer
5f77ff5885 * Finished (for now) anvil-manage-firewall. It's been added to anvil-daemon as well.
* Updated Log->entry() to accept 'print => [0|1]' to send a log message to STDOUT (minus prefix) to avoid tools that were repeatedly calling print and Log->entry back to back.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-11 02:27:55 -05:00
Digimer
67e8f26bdb * Fixed a bug where anvil-manage-install-target was always refreshing the local RPM repository instead on doing so only once per day.
* Finished getting proper storage management and software RAID support added to plan_partition.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-11-29 19:55:23 -05:00