Commit Graph

361 Commits

Author SHA1 Message Date
Tsu-ba-me
1ec32bfbaf fix(cgi-bin): filled in logic to power on/off target host(s) 2021-06-01 14:13:29 -04:00
Tsu-ba-me
869a5ec807 fix(share): add error template 0304 for request body parse failure 2021-06-01 14:13:29 -04:00
Digimer
4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped.
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-31 13:34:49 -04:00
Digimer
8807915bb7 The theme of this commit is database cleanup and fixes.
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-30 15:16:25 -04:00
Digimer
6abe06f125 The theme of these commits is improving DB responsiveness.
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-29 23:34:22 -04:00
Digimer
bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host.
* Created Words->center_text that takes a string (or string key) and centers it to a given string length, padding white spaces on either side of the string as needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-26 20:24:05 -04:00
Digimer
ff65712fd9 * Created the function check_daemons() in anvil-daemon to check that needed daemons are running when it starts. This was specifically added to address a periodic issue with machines booting without NetworkManager running.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 15:27:10 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
44864ce321 * Updated Database->resync_databases() to set a default schema of 'public'. Also fixed a bug where, when the difference in record numbers between two line was > 999, it would not trigger a resync.
* Updated the scan agent timeout to 60 seconds. Also made the scan agent exit code log entries more helpful.
* Updated System->collect_ipmi_data() to now better handle duplicate sensor names. Now, instead of simply appending an integer, we find the hex address and use that in the sensor name when duplicates exist. This solves the problem of the sensor names not being consistently shown in order.
* Fixed message bugs (bad variable insertions) in scan-apc-pdu and scan-apc-ups.
* Fixed schema procedure bugs in the 'temperature' and 'ip_address' tables where the columns were in bad order, causing constanty updates.

Incomplete work;
* Create the shell of 'anvil-manage-storage', but virtually no logic exists in it yet.
* Started work on anvil-safe-start to deal with an issue where DRBD resources don't start when a server is running on a peer.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-13 23:27:38 -04:00
Digimer
7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts.
* Fixed a bug in Database->get_tables_from_schema where history.X and X tables were being stored in the table list.
* Updated ocf:alteeve:server to no do resyncs on DB connect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-11 14:14:00 -04:00
Digimer
f833c311ba * To address issues with scancore debugging, we needed a tool to purge old anvils and hosts from the database. The 'test.pl' in this commit contains the new logic that will be merged into tools/striker-purge-host shortly.
* Created Database->find_host_uuid_columns() and ->_find_column() to create a list of tables and column names in the proper order to allow deletion of foreign keys to that deeply nested primary keys can be deleted. Specifically, this was meant for hosts -> host_uuid and anvils -> anvil_uuid, though it should work for other tables.
* Updated html/jquery-ui-1.12.1/package.json to address CVE-2020-7729
* Fixed a bug in the temperature table's history procedure where temperature_weight wasn't being copied.
* Updated anvil-provision-server to support '--anvil' that can take either the anvil-uuid or anvil-name.
* Updated anvil-safe-stop to default the stop-reason to 'user'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-08 02:02:46 -04:00
Digimer
4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches.
* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-02 21:40:48 -04:00
Digimer
416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node.
* Created Cluster->get_fence_methods() that parses all fence methods out of a recorded CIB and stores the in a hash for a given host_uuid.
* Fixed a bug in ScanCore->post_scan_analysis_striker() where the short_host_name was not being stored correctly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-01 19:49:27 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Digimer
15dab8aab7 * Started working on the node post-scan login in ScanCore. Created ScanCore->check_temperature() to get a thermal score against a node.
* Update ScanCore->check_power() to not require the parameter values.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-26 21:42:57 -04:00
Digimer
f202187c34 * anvil-safe-stop is complete! Testing still needed, of course.
* Updated DRBD->manage_resource() to call 'drbdadm adjust <res>' when starting a resource to help deal with a periodic issue where the 'allow-two-primary' option on the peer doesn't match the local setting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 11:56:11 -04:00
Digimer
3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed).
* Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 00:04:20 -04:00
Digimer
27259d1d53 * Finished anvil-rename-server!
* Created Storage->delete_file() that, well, deletes files (locally or on a peer).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-22 13:29:50 -04:00
Digimer
2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name.
* Added the scan-hpacucli scan agent. It's been done for a while and should have been added ages ago.
* Updated anvil-rename-server to get to the point where it will take down the DRBD resources on all machines, but waits if there is a sync under way. It also verifies that the server is off on all systems from virsh's perspective.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 22:46:51 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
a480357049 * Fixed a bug in Cluster->assemble_storage_groups() where, if a group is created during an anvil-provision-server run, the group would get created multiple times.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-15 18:51:10 -04:00
Digimer
b36093671b * Updated Database queries that were passing 'debug => $debug' to not do that, as it was causing far too much (useless) noise in the logs.
* Turned on print to console for logging in anvil-provision-server. Also updated it to check if the cluster is running and hold until it is.
* Cleaned up some code in Get->available_resources() that proved hard to debug.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-15 02:35:58 -04:00
Digimer
798518ba5e * While working on the boot/shutdown server tools, ran into and fixed a bug where files uploaded before an Anvil! was added could not have those files sync'ed. This was fixed though the new Database->check_file_locations() method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 22:56:18 -04:00
Digimer
e036515df3 * Got anvil-safe-start to the point where is starts the cluster stack. Need to create the 'anvil-boot-server' and 'anvil-shutdown-server' before it can be completed, so those files have been added.
* Created Cluster->parse_quorum() to check if a node is quorate as 'have-quorum' in the pacemaker CIB doesn't appear to be super accurate during startup.
* Fixed a bug in striker-manage-install-target where if a node didn't have any registered IPs, it would break before generating the repo data.
* Fixed a bug in anvil-join-anvil where if the database had to be reconnected, the job data was lost.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-14 00:26:06 -04:00
Digimer
faf1399440 * Continued work on anvil-safe-start. Got it to the point where it detects shared networks with its peer node and waits for all networks to be up.
* Fixed a bug in scan-drbd where the volume_uuid wasn't being stored in the proper hash, breaking insertions into scan_drbd_peers in some cases.
* Updated System->pids() to work with remote targets (will be used later to check for parallel runs of anvil-safe-start).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 20:46:30 -04:00
Digimer
15e71768a1 * Started work on anvil-safe-start. The enable/disable logic and how it runs automatically is controlled by the database and the tool can be used to control anvil-safe-start on both the local and peer node. It will be started by ScanCore, if scancore starts within 10 minutes of the node booting. It will always be able to run manually.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-12 00:28:24 -04:00
Digimer
942e0f66bf * Finished the 'get_X' enpoints so far defined. Added get_servers and completed get_status
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-11 16:26:42 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
70dc0598f2 * Created Storage->manage_lvm_conf() that checks / updates lvm.conf to add a filter to avoid seeing DRBD devices as LVM components. This is now called from striker-initialize-host and scan-drbd.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-31 23:59:19 -04:00
Digimer
59b867cc25 * Updated DRBD->gather_data() to check if drbdadm exists before trying to call it to avoid scary errors in the logs. Also moved some strings that pulled from the scan-drbd agent into the main words file.
* Fixed a bug in ScanCore->agent_startup() where a (thankfully broken) check to append tables to the 'sys::database::check_tables' would cause an infinite loop as both were pointers to the same anonymous array.
* Fixed a bug in scan-ipmitool where the scan_ipmitool_variables table didn't use a host_uuid reference, causing resyncs of that table to sync for all hosts and cause DB errors when the scan_ipmitool record from another host wasn't sync'ed yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-31 01:36:11 -04:00
Digimer
ebfecf39c3 * Updated Database->manage_anvil_conf() to not manually create a backup, as Storage->write_file() creates a backup anyway (so we were getting two backups per one change).
* Updated Storage->write_file() to add a short UUID suffix to the temp file before rsync'ing to the target to help avoid source temp file name collisions in parallel running copies to different targets.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-23 06:01:45 -04:00
Digimer
53d654fd9d * Got scan-filesystem to the point where it now collects the data it needs. No processing done yet though. Also reworked the SQL schema for the agent to record more data.
* Created Storage->parse_df that, shock!, parses 'df' output. Finished the long-ago started ->parse_lsblk as well.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-19 18:19:29 -04:00
Digimer
296556328b * Fixed a bug in Convert->bytes_to_human_readable() to handle being passed in bytes (with the size units of 'b' ot 'bytes').
* Fixed a bug in Database->_find_behind_databases() _find_behind_databases() where the logic to figure out which column was the host_uuid reference was too liberal, causing the wrong column to be selected in some cases. Also added a check to not look for host_uuid columns on specific tables where a match would be made, but the column is allowed to be null (like server_host_uuid that indicates the host of the server).
* Started work on the scan-filesystems scan agent, which is needed to record the data that the file system UI endpoints will need.
* Removed the 'not null' constraint from 'servers' -> 'server_host_uuid'.
* Fixed a bug in 'anvil-provision-server' where the driver ISO being 'none' caused the provision script to use the driver ISO switch.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-18 00:37:17 -04:00
Digimer
54496cbeb0 * Added a check to Database->get_ip_addresses() to check is a hash is set before using it, to help avoid unitialized variable messages.
* Updated Remote->test_access() to not used cached SSH access.
* Updated anvil-configure-host to abort if the host is in a cluster.
* Updated anvil-join-anvil to clean up some variable checks to help avoid unitialized variable messages.
* Updated striker-initialize-host to check if an anvil RPM is installed and, if so, not install the Anvil! repo.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-10 20:35:05 -05:00
Digimer
5db09f565d * Updated anvil-join-anvil to actively call a cluster start once per minute while waiting for initial startup.
* Added a check to striker-initialize-host the see if anvil-X RPM is already installed. If so, it will not install the Alteeve repo, even if it's not found.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-10 14:14:04 -05:00
Digimer
3733220b50 * Updated Log->entry() to prefix log lines with the short 'job-uuid', when the log entry is coming from a program running as a job. This is meant to make it easier to break up what log lines belong to what jobs, if multiple jobs are running at the same time (ie: when initializing multiple nodes / dr hosts in parallel).
* Updated Remote->call() to return ('!!error!!', '!!error!!', 9999) when an error hits. Made Remote->test_access() explicitely check for '1' to be returned in order to confirm access, fixing a bug where bad target value caused false positives. Updated ->_check_known_hosts_for_target() to no longer explicitely check for 'ssh-rsa' so that machine keys using different cyphers are detected as being in known_hosts properly.
* Updated striker-auto-initialize-all to initialize nodes and DR hosts networks before trying to form them into an Anvil!. Fixed several other bugs as well. More testing is needed, but it works now.
* Updated striker-initialize-host to check for the alteeve repo and, it not found, check for accress to alteeve.com. If access, it will install our repo now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-10 02:26:09 -05:00
Digimer
426b5f58f7 * Finished (but not yet tested) tools/striker-auto-initialize-all. Expect many bugs if used.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-09 00:49:09 -05:00
Digimer
e71c7d4966 * Finished getting tools/striker-auto-initialize-all to merge the built Strikers.
* Fixed a bug in Remote->call() where the output of the call not ending in a newline wasn't having the return code parsed off properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-04 23:28:39 -05:00
Digimer
15fd0e5ce8 * Updated anvil-daemon (and Database->insert_or_update_jobs) to now recognize jobs with the job_status of 'scancore_startup' to run only when ScanCore starts.
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-04 01:41:33 -05:00
Digimer
0fb191c00f * Made more progress on tools/striker-auto-initialize-all, now to the point where it loads the variables needed to initialize Striker dashboard.
* Cleaned up / added some logging in various locations.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-02 01:18:18 -05:00
Digimer
45a9cb04b0 * Fixed a bug introduced in the last commit that made Get->os_type() fail when called locally.
* Made the error reported by Remote->call() more verbose when called without 'target' being set.
* Updated anvil-daemon to not call jobs more that once per minute.
* Started work on striker-auto-initialize-all, still very far from complete.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-23 01:56:12 -05:00
Digimer
1b65f53faa * Remove host-health from the 'hosts' table as it wasn't needed, given the 'health' table. Bumped the SQL version to 0.0.2
* Updated Get->os_type() to use 'cat' instead of Storage->read_file() because 'rsync' may not be available when it is called during striker-initialize-host calls.
* Updated Database methods to skip 'oui' and 'state' during resync.
* Updatedb striker-initialize-host to detect when it's initializing a CentOS Stream Node / DR Host and enable the HA repo.
* Created the tools/striker-auto-initialize-all tool, which is very much incomplete, that will allow for the rapid creation of a full Anvil! from freshly installed machines autonomously.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-22 19:22:47 -05:00
Digimer
e8efbab343 * The work on PXE / UEFI support is broken, and will be set aside for the time being. The commit here is working to getting things fixed, but it's taking too much time away from more pressing issues.
* This commit includes two unrelated test files for UI work, cgi-bin/get_anvil_status and cgi-bin/get_anvils.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-16 17:32:43 -05:00
Digimer
2937afad26 * Got UEFI booting working up to the grub menu, though files formerly provided by anvil-striker-extra still need to be added to the main anvil-striker to work properly.
* Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-10 23:49:11 -05:00
Digimer
26f268a4aa * Updated the 'hosts' table and relevant Database methods to add columns for 'host_status' and 'host_health'. These will simplify tracking a node's (power) status and overall health.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-09 21:35:58 -05:00
Digimer
6f8f97b184 * Updated Get->os_type() to support detection of CentOS Stream separate from CentOS.
* Updated pxe.txt to start support for UEFI boot target.
* Updated update_install_source to be smarter about moving html and tftp directory names to better reflect the host OS, and to make it support converting OSes on the fly. Also added support to the package list for CentOS Stream.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 22:37:27 -05:00
Digimer
6009590352 * Fixed a bug in scan-apc-ups where changes in the transfer reason were not being recorded.
* Cleaned up a log of logging to reduce the amount of log entries when running at log level 1.
* Bumped the scan-ipmitool default 'jump' range to 10c.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-07 18:11:29 -05:00
Digimer
0ec1bf6b6a * Updated DRBD->delete_resource() to return a success if asked to delete a non-existent resource (as can happen when partial anvil-delete-server runs are re-run).
* Reworked DRBD->get_next_resource() to pull from the database, and to no longer do that increments-of-three nonsense. Avoidable complexity. Also added a call to Cluster->get_anvil_uuid() if the 'anvil_uuid' parameter wasn't passed.
* Updated Database->get_host_from_uuid() and ->get_hosts() to now take 'include_deleted' parameter and default to not returning deleted hosts. This fixed issues where anvil-{delete,provision}-server calls could assign jobs to now-deleted hosts with reused host names.
* Updated anvil-delete-server to print log entries to STDOUT. Also updated it to not wait of shutdown of a server in pacemaker to complete, and instead to destroy it after calling pacemaker's resource stop. Updated to also check to see if the server being deleted is already out of pacemaker and, if so, skip that step and directly try to destroy the server, if it's running.
* Updated anvil-provision-server to force 'peer_mode' runs to pull their TCP Port and DRBD minor numbers from the job. This fixes a bug where the same resource on two machines could use different TCP ports.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-05 23:41:48 -05:00