Commit Graph

59 Commits

Author SHA1 Message Date
digimer
4f6fa4b6ed Working on a bug where broken manifests are saved.
* Updated Striker->generate_manifest() to add pod and make the prefix,
  sequence and domain parameters required.
* Created the check_for_broken_manifests() function for anvil-daemon to
  detect/remove broken manifests.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-10-24 13:36:30 -04:00
digimer
f23768ac32 * Updated Striker->generate_manifest() to remove integral DR support and add migration-network support. This helps address issue #440.
Signed-off-by: digimer <mkelly@alteeve.ca>
2023-08-17 18:58:12 -04:00
Tsu-ba-me
226c423af0 fix: allow param override in generate_manifest in Striker.pm 2023-06-19 15:15:32 -04:00
Digimer
aa7d9bdf14 * Fixed a bug where resync'ing the database was missing tables.
* Updated Network->find_matches() to take 'source' and 'line' parameters to help identify the source of issues with missing hashes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Tsu-ba-me
18e8b21214 fix(Anvil): config Apache to forward requests to Striker UI API 2022-03-18 22:50:41 -04:00
Digimer
04cb116c1b Updated anvil-parse-fence-agents to validate each fence agent's metadata is valid before adding it to the unified XML.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-19 00:58:26 -04:00
Tsu-ba-me
92335b29cc fix(Anvil): add clean up logic when failed to validate Apache conf after modify 2021-07-28 19:29:24 -04:00
Tsu-ba-me
d2d7a5380c fix(Anvil): search all args for Access-Control value 2021-07-28 19:17:25 -04:00
Tsu-ba-me
18ec7b1c1a fix(Anvil): abort when no new Apache conf created 2021-07-28 19:17:23 -04:00
Tsu-ba-me
3de9912f51 fix(Anvil): use augeas to modify Apache conf 2021-07-28 18:41:28 -04:00
Digimer
24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time.
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-03 22:25:36 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added.
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 00:26:06 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
426b5f58f7 * Finished (but not yet tested) tools/striker-auto-initialize-all. Expect many bugs if used.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-09 00:49:09 -05:00
Digimer
6f8f97b184 * Updated Get->os_type() to support detection of CentOS Stream separate from CentOS.
* Updated pxe.txt to start support for UEFI boot target.
* Updated update_install_source to be smarter about moving html and tftp directory names to better reflect the host OS, and to make it support converting OSes on the fly. Also added support to the package list for CentOS Stream.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 22:37:27 -05:00
Digimer
1d03a386d3 * Created Database->get_bridges() that, surprise, loads data from the 'bridges' table.
* Started work on Get->available_resources() that will take an 'anvil_uuid' and figure out what resources are still available for use by new servers or that can be added to existing servers.
* Fixed a bug in ScanCore->agent_startup() where tables weren't being generated properly from the agent's SQL file.
* Made Storage->change_mode() return silently if it's called without a mode being passed. This happens frequently and is harmless so it's not worth filling the logs with errors.
* Renamed the 'start_time' key to 'at_start' when recording files' MD5 sums in Storage->record_md5sums and ->check_md5sums.
* When we moved the directory scan logic out of the 'scancore' daemon and into 'Storage->scan_directory', the logic to record scan agent names in 'scancore::agent::<file>' was removed. This broke a few things and, so, it was restored when it was found that a file starts with 'scan-' and the directory matches the scancore agent directory.
* Moved the 'scancore' daemon's 'load_agent_strings' to 'Words'
* Updated Words->parse_banged_string() to look for variables in the format 'value=X:units=Y' and translate it properly.
* Fixed a bug in scan-ipmitool where discovered sensor INSERT SQL queries were queued, but not committed.
* Fixed a bug in scan-storcli where a while loop was broken, preventing execution.
* Fixed a bug in the 'scancore' daemon where it wouldn't exit if sums changed. Fixed a bug where alerts weren't being sent between loops. Fixed a bug where command-line log level wasn't surviving inside the main loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-23 01:10:23 -05:00
Digimer
0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names).
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-20 00:27:36 -04:00
Digimer
f8a466a963 * Fixed a bug in Striker->load_manifest where when a single fence device or UPS was defined, the XML would fail to parse.
* Added a null entry when editing a manifest when no UPSes have been defined.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-04 00:41:25 -04:00
Digimer
1498e1b53c * Got server migration working using ocf:alteeve:server in a test environment!
* Converted most 'eval { }' calls to localize $@ and test the output of the eval, instead of checking to see if $@ was set.
* Converted all 'local' hash references to instead use the short host name of the local machine as a new standard.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-19 18:54:09 -04:00
Madison Kelly
30f2b3fa8e * Switched all hash 'local' keys to be the host's short user name. Untested, likely bugs to be fixed in the next commit.
Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2020-08-18 19:34:08 -04:00
Digimer
c27cc7507f * Renamed striker-parse-fence-agents to anvil-parse-fence-agents and changed anvil-daemon to run it on all machines.
* Cleaned up a lot of logging.
* Updated Cluster->parse_cib() to track if a stonith device has 'delay' set.
* Got a lot more work done on anvil-join-anvil's stonith processing, but it still isn't complete. Updated it to change shell user passwords as well.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-06 01:20:53 -04:00
Digimer
3c2f25a860 * Added 'fence_delay' fence agent to handle the corner cases where an IPMI BMC had crashed until a power cycle, and PDU fencing was effected, but failed to report as such.
* Updated Cluster->parse_cib() to take a CIB as a parameter.
* Fixed a bug in Database->get_hosts() where loading the host_ipmi value was filtered through Log->is_secure.
* Updated Striker->get_fence_data() to parse the switches to make it easier to map a fence agent's command line switches to STDIN arguments.
* Created System->parse_arguments() that converts a series of command line switches and their values into a hash. It's similar to Get->switches(), but works on any string.
* Continued work on anvil-join-anvil's fence configuration logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-30 00:23:47 -04:00
Digimer
d2d5d7b460 * Fixed a bug in Striker->load_manifest() where fences were parsed twice, the second time missing a hash reference.
* Updated striker to now only offer gateway for IFN networks. EL8 seems to ignore 'GATEWAY="x"' in interface configs which caused anvil-join-anvil to always think an interface needs to be updated. Updated as well to remove DNS entries set in interfaces that are not the default gateway.
* Fixed a bug where DNS entries were being missed, causing entries to be repeatedly added to the interface that was the gateway interface.
* In anvil-update-states, added Get->switches() so that verbosity switches are used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-28 00:59:02 -04:00
Digimer
de43ea3ac1 * Renamed all Validate->is_X to Validate->X. Also created Validate->ipv6() to validate IPv6 addresses using Data::Validate::IP (and added it as a requirement to the .spec base RPM).
* Added the fix from the last commit for System->call to handle returned data without an ending newline to Remote->call.
* Got more work done on System->update_hosts(). It's able to add new hosts, but misses the short and FQDN host names. Need to fix that and the verify existing / manual entries aren't molested.

Signed-off-by: digimer <digimer@pulsar.alteeve.com>
2020-07-07 01:18:38 -04:00
Digimer
76b6550ac6 * Created Database->get_ip_addresses() that pulls the IPs out and stores them in a hash that allows for easy referencing to associated interfaces and networks.
* Created Get->trusted_hosts() that finds the dashboards the host uses and, if the host is in an Anvil!, the peers in the same anvil.
* Created (but not finished yet) System->update_hosts() that will add and edit entries for all IPs to trusted hosts.
* Fixed a logging bug in Striker->load_manifest().
* Fixed a bug in System->call where, the the output from the shell call didn't end in a new-line, it would not parse the return code and lease the return code string appended to the shell output.
* Fixed a big in System->change_shell_user_password() where a new-line (\n) meant for the shell call wasn't escaped properly. There was also a duplicate 'return_code' variable preventing the actual return code from being read.
* Got more work done on anvil-join-anvil to update the hacluster password (when needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-03 18:11:56 -04:00
Digimer
726a4374d1 * Renamed the database table 'host_keys' to 'ssh_keys' to better represent what it stores.
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-10 18:26:50 -04:00
Digimer
22e7e8b03e * Finished the "run manifest" form / page. Actually saving / pushing the job to nodes is not yet done though.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-26 21:37:02 -04:00
Digimer
e99bfac9be * Created Database->get_anvils() and ->insert_or_update_anvils().
* Updated the anvils database table to have columns for traching which hosts are used as node 1, node 2 and the DR host.
* Updated Database->get_hosts() to check which, if any, Anvil! a given hosts is attached to.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-25 00:02:52 -04:00
Digimer
c5c75f1ddf * Created Database->get_manifests() that loads manifests into a hash but UUID and name.
* Got deleting of manifests done.
* Fixed a couple bugs from the _network to _ip variable rename.
* Fixed a bug where fence variables with double-quotes in them weren't being handled properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-23 23:54:36 -04:00
Digimer
0dbb07dfb7 * Fixed a bug where fence option values with double-quotes where not being stored correctly.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-22 13:45:44 -04:00
Digimer
1e8982704e * Finished the ability to load an install manifest to edit it.
* Renamed to clarify the IP address field from ..._network to _ip.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-18 23:20:33 -04:00
Digimer
f0f949bcf0 * Created Striker->load_manifest() to load (and parse) install manifests.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-17 15:23:12 -04:00
Digimer
099bc1401a * Finished the menus to save a new Install Manifest and got the create page showing the existing manifests.
* Updated Database->insert_or_update_manifests() so that not passing in 'manifest_last_ran' is OK.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-14 17:52:14 -04:00
Digimer
97bcab86d6 * Got the new Striker->generate_manifest() generating the manifest XML.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-13 23:02:05 -04:00
Digimer
23c17dbb98 * Renamed Network->match_gateway() to ->is_ip_in_network(), and changed the parameters 'gateway -> ip' and 'ip_address -> network'. This is to clarify it's use as a general purpose "does this IP fall in the given network range?".
* Created the shell of Striker->generate_manifest() that will pull out CGI data to generate the install manifest itself and save/update it.
* Got step 3 menu finished along with the corresponding sanity checks.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-12 18:50:11 -04:00
Digimer
e66bc32693 * Added the ability to store, edit and delete UPSes
** Created Database->get_upses() and ->insert_or_update_upses().
** Created Striker->get_ups_data(). This parses the special 'ups_XXXX' strings.
* Updated Validate->is_domain() and added ->is_host_name() to use the Data::Validate::Domain module (which is now required in the core RPM).
* Started work on manifest handling.
* Sorted the language keys alphabetically.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-05-02 23:05:58 -04:00
Digimer
1edf723ea5 * Finished adding support for deleting fence agents.
* Added the 'include_deleted' parameter to Database->get_fences().

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-03-30 23:14:10 -04:00
Digimer
f49f3cd890 * Created Database->get_fences() to load existing fence devices into memory.
* Removed fences -> fence_type from the database schema, wasn't needed.
* Updated Striker->get_fence_data() to store data in the 'fence_data' parent hash so to not conflict with 'fences' used in Database->get_fences().
* Got the saving and loading of fence devices working.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-03-19 01:33:07 -04:00
Madison Kelly
3838babf57 * OK, this time CentOS is actually supported. For reals.
** Needed to add a couple more packages to CentOS's package list.
** Changed the PXE kickstart template to create a dedicated '/boot' partition (raw partition or on RAID 1). This seems to be required now on 8.1.
** Added PXE's UEFI support to the template system (untested, but it's at least generated now).
* Filtered out 'debug' and 'verbose' options when configuring fence devices.
* Added an internet test to tools/striker-manage-install-target and skipped attempting to download packages when there's no internet. Also made loading the host OS info into a small function.
* Started creating the man pages.

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2020-02-03 02:10:00 -05:00
Digimer
d3bb350668 * Filtered out 'delay' and 'plug' options from Striker->get_fence_data()'s options parsing.
* Cleaned up striker's display of fence agent data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-02-01 19:05:39 -05:00
Digimer
00b9bce669 * Cleaned up a couple things in the fence config menu.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-22 23:25:10 -05:00
Digimer
b8c0577b54 * Fixed several issues with the fence configuration menu in striker.
* Added filters Striker->get_fence_data() for parameters. Manually change 'action' entries from 'string' to 'select' and use the data in the 'actions' element to populate it, with actions that don't make sense filtered out.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-22 15:21:35 -05:00
Digimer
818ef23634 * Moved the fences_unified_metadata file from /tmp, which apache can not read, to /var/www/html/.
* Fixed a bug (well, made a work-around for an issue without a known reproducer) where, on some occassion, a record will end up in the public table without being copied into the history schema. When this happens, the next resync would crash out because the resynd reads in the history table only. Now, when about to INSERT a record into the public schema during a resync, an explicit check is made to see if the record alread
y exists. If it does, the INSERT is instead redirected to the history schema.
* Cleaned up the fence agent metadata when displaying to a user, converting the shell codes to underline a string with square brackets instead. We also now replace newlines with <br /> tags. Lastly, to help fence_azure_arm's metadata description to display cleanly, a check is made to format the table correctly.
* Began work on the Striker menu for handling fence device management

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-20 23:41:01 -05:00
Digimer
f636e399d7 * Created tools/striker-parse-fence-agents which finds all the available fence agents on a system and gathers their metadata into a common XML file.
* Created Striker->get_fence_data() that reads/parses the unified fence metadata file created by tools/striker-parse-fence-agents.
* Created the new 'fences' database table and Database->insert_or_update_fences() to handle it.
* Added hosts -> host_ipmi that will, later, store information on how to access the host's IPMI interface, when available.
* Sketched out how the new Install Manifests are going to work.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-15 17:04:18 -05:00
Digimer
76e9352717 * Added a flag that tells anvil-daemon when a node is having it's network mapped. When this happens, open ssh connections are closed each loop and only tasks related to mapping the network run. This improves responsiveness in Striker when reporting which network links have come up or gone down.
* Fixed a bug in Database->insert_or_update_variables() where, if 'update_value_only' was set but not variable_uuid was passed or could be found, an (incomplete) INSERT would be attempted.
* Added support for generating module metadata when setting up local repos on Striker.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-09 01:52:04 -05:00
Digimer
0dfd2ddbf0 * Renamed 'ip' to 'ip_address' in Striker->parse_all_status_json() and related functions.
* Got the menu for mapping a host's network displaying (much work still to be done).
* Updated the anvil.js funtion to run dependent on the page being shown. For the main menu, the json is now properly reread and display updated as json content changes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-29 23:37:11 -05:00
Digimer
0b2439edc3 * Finished the jquery/css stuff for showing the list of hosts that can have their network configured.
* Fixed unclear error logging in Network->find_matches().
* Updated System->generate_state_json() Striker->parse_all_status_json() to determine if another machine can be reached from the local dashboard. If it can be reached, the first matching interface and IP are recorded.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-27 23:26:21 -05:00
Digimer
c608ebb232 * Added bonds -> bond_bridge_uuid to track which bridge, if any, that a bond is connected to. Updated Network->load_interfces() to record this.
* Added missing foreign key references to the SQL schema.
* Added support to tools/anvil-update-states to connect bonds to bridges, as appropriate.
* Finished the logic in test.pl to pull the network data (with connections between bridges, bonds and interfaces) needed for the WebUI.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-26 00:15:37 -05:00
Digimer
7aaef3d6d5 * Fixed bugs in Striker->parse_all_status_json() and System->generate_state_json() where the connected interfaces to bridges and bonds wasn't be stored properly.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-21 19:23:16 -05:00