Commit Graph

28 Commits

Author SHA1 Message Date
Digimer
9eec6c4977 * Created ScanCore->check_temperature_direct() based around that start logic from ScanCore->post_scan_analysis_striker() temperature check, and updated the later to use the former.
* Updated the logic of when to boot a node or DR host that was found to be off for unknown reasons to require both poewr and temperature to be OK, and checks against the new 'feature::scancore::disable::boot-unknown-stop' config variable.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-29 22:43:23 -05:00
Digimer
e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-19 14:06:18 -04:00
Digimer
221f468b6c * Fixed a bug where duplicate IPMI sensor names of type 'Volts' wasn't being processed properly, causing sensor data to not be recorded.
* Added a comment about disabling /etc/hosts management in anvil.conf.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-31 14:42:30 -04:00
Digimer
8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf.
* Moved the check for an agent being disabled into ScanCore->agent_startup()

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-18 23:07:15 -04:00
Digimer
4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped.
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-31 13:34:49 -04:00
Digimer
8807915bb7 The theme of this commit is database cleanup and fixes.
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-30 15:16:25 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
41d528418d * Increased the trigger point for database archiving. The current values were too low and cuasing frequent archive -> resync cycles.
* Fixed a bug in that archiving defaulting to not store on disk was not working properly. Now acts as described in anvil.conf.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 11:57:29 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Digimer
8127c70237 * Support for OS type selection was missing in tools/anvil-provision-server, this commit adds support for this, as well as some infrastructure to support it. This includes a new 'sys::servers::os_short_list' variable that contains a CSV of main OSes to show in a "short" list (the full list is massive). This variable can be set by the user in anvil.conf. Also added job progress calls that were missing through the storage config.
* Created a new tools/striker-parse-os-list tool that parses 'osinfo-query os' and prints out entries for words.xml for any new OSes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-16 03:30:51 -05:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
b436500a54 * Started working on getting database archiving working. It's close to done, but needs testing/debugging and other finery.
* Created Database->get_host_from_uuid() that takes a host UUID and returns the host's name.
* Reworked Network->find_matches() to return both the match's IP and subnet.
* Finished getting tools/striker-initialize-host to add all known peers to the target, using IPs on the target's available subnet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-26 01:53:52 -04:00
Digimer
02c4fe1fa1 * Updated all perl module modes to remove the executable bit.
* Updated anvil.sql to add the new tables needed for alert mail delivery.
* Update anvil.sql and Database->initialize to now default the user to 'admin' and swap that out if needed, instead of using the #!variable!user!#' replacement variable.
* Started updating anvil.spec for EL8.
* Added support for 'striker::repo::extra-packages' which users can use to add additional packages to the Striker repositories.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-05 18:57:44 -07:00
Digimer
2fa4048780 * Updated anvil.conf to default-enable various defaults. Also dropped the archive thresholds.
* Fixed a bug in the PXE default config path to install.img.
* Added tftp to the BCN firewall template.
* Fixed a bug in anvil-daemon / striker-manage-install-target where config files weren't being updated regularly (only when repo updates happened).
* Removed an RPM from striker-manage-install-target that is no longer available on F28.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-12 00:10:14 -05:00
Digimer
5f77ff5885 * Finished (for now) anvil-manage-firewall. It's been added to anvil-daemon as well.
* Updated Log->entry() to accept 'print => [0|1]' to send a log message to STDOUT (minus prefix) to avoid tools that were repeatedly calling print and Log->entry back to back.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-11 02:27:55 -05:00
Digimer
8468215831 * Added logic to tools/anvil-manage-install-target to only update the local RPM repo on a periodic basis.
* Updated tools/anvil-configure-striker to be consistent about naming firewall zones using upper-case.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-11-13 02:47:43 -05:00
Digimer
ea65fa08aa * Added generation of kickstart files from a single template and made them more configurable by the user via anvil.conf.
* Made the PXE spash image come from the active skin.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-11-03 02:55:35 -04:00
Digimer
07c3b405ad * Starting work on adding "Install Target" function (will likely rename this, but basic same function as IT in m2).
* Added 'sys::database::failed_connection_log_level' to allow silencing of log messages when a Striker peer database is not available.
* Started updating the .spec for the new release to add supported packages needed for PXE/dhcp/tftpboot.
* Added to repo tftpboot files as pulling them out of the packages and moving them into the right place relative to the modest size of adding them directly to our source wasn't justified.
* Created the still very very early 'tools/anvil-manage-firewall' tool.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-10 19:43:23 -04:00
Digimer
0fa3c42f2f * Fixed a bug where setting the debug level to 3 caused a deep recursion and a system hang.
* Update Anvil::Tools->new() to access the parameters 'log_level', 'log_secure' and 'debug', streamlining the frequent calls to $anvil->Log->level and ->secure in program startup, and allowing the values to take effect during the ->new constructor.
* Passed 'debug' to child method calls in more places (still more to do though).
* Fixed a bug where 'test_table' wasn't set in the right place, causing the database to try to initialize repeatedly.
* Made Database->archive_database only run if called with root access.
* Now the number of database connections are stored in 'sys::db_connections' instead of checking the returned number, and that is cleared on disconnect.
* Started working more on 'anvil-daemon', including adding support for System->call being taking 'background', 'stderr_file' and 'stdout_file' paramters which, when set, used Proc::Simple to background the process.
* Did some more work on database archiving, though still far from done.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-01 02:06:16 -04:00
Digimer
7e63cfbcc4 * Fixed the schema path.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-07-13 18:29:28 -04:00
Digimer
0272ba8b80 * Moved the network default values into the main defaults hash.
* Fixed a bug in Database->insert_or_update_network_interfaces where independent interfaces (not under a bridge or a bond) were not being saved.
* Continued working on improving Striker's network config jquery/form functions.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-06-15 02:17:20 -04:00
Digimer
de333704b5 * Created System->change_apache_password() to update (and enable) Striker's apache user. For now, it simply enables it in httpd.conf, it doesn't actually set/update the password.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-05-10 01:41:45 -04:00
Digimer
a89fb24adf * Changed the Storage->copy() 'target' parameter to 'target_file' to avoid confusion with the often-used 'target' parameter for connecting to remote machines.
* Changed 'database::...' so that 'x' is now the database host's UUID instead of a simple integer. This will simplify sync'ing configs. Also removed default entries, and made it so that anvil-prep-database injects the local config during first setup. Renamed Database->get_local_id to get_local_uuid and changed the 'id' parameter to 'uuid'. Changed Database->initialize's 'id' parameter to 'host_uuid'. The Database->query, Database->write, Database->_mark_database_as_behind and Database->_find_behind_databases methods had their 'id' parameter changed to 'uuid'.
* Added the 'remote_user' parameter to Get->anvil_version, System->ping and System->change_shell_user_password for conencting to remote targets.
* Added the 'remote_user' parameter to all internal Remote->call uses.
* Updated Storage->backup, Storage->copy_file, Storage->make_directory,

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-04-30 11:43:30 -04:00
Digimer
a294c6c4fa * Updated the database components to use the name 'anvil' and the user 'admin'. The 'database::user' and 'database::name' variables are still supported, but now hidden.
* Fixed a bug where some '$anvil->{}' variables should have been '$anvil->data->{}'.
* Started merging message keys on 'error_xxxx', 'warning_xxxx', etc.
* The anvil-configure-network now configures the network. Commented out, the tool can reconfigure the entire network without a reboot, but a current issue with the post-configured system refusing to use the allocated interface as the default gateway is to be reviewed at a future time. For now, a closing reboot will be issued.
* Started creating 'anvil-change-password' that will update passwords, including apache (and configure .htpasswd when needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-04-13 19:55:34 -04:00
Digimer
6f4df4ed22 * Changed 'database::X::ping_before_connect' to 'database::X::ping' and made the value be the actual timeout to wait for pings when trying to connect to a database.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-02-23 00:58:47 -07:00
Digimer
2170c00add * Added the 'debug' parameter to System->check_alert_sent. Also updated it to use 'alert_sent_uuid'.
* Added the 'debug' parameter to System->enable_daemon.
* Fixed a bug where the old 'Tools.sql' files was being referenced instead of the new 'anvil.sql'.
* Added the 'debug' parameter to Database->initialize and Database->write. Also made it enable the postgresql daemon when initializing the DB.
* Added the 'debug' parameter to Get->host_uuid.
* Fixed the old anvil.conf variable from defaults::log::db_transactions to sys::database::log_transactions.
* Fixed a bad replacement variable name in anvil.sql.

Signed-off-by: Digimer <digimer@alteeve.ca>
2017-12-27 13:01:58 -04:00
Digimer
2b9c6c26dc * Fixed a couple remaining issues from the recent merger. Specifically, '$$anvil' was fixed from a bad regex and the path/names of our tools were fixed.
Signed-off-by: Digimer <digimer@alteeve.ca>
2017-10-20 11:13:00 -04:00
Digimer
1cb42080c3 ** Major Changes **
We've decided to give up on trying to keep ScanCore, AN::Tools and Striker as three separate things. We had originally hoped to make ScanCore easily separatable from the Anvil!, but this was adding increasing complexity to the project and complexity is the enemy of reliability.

In this release, AN::Tools becomes Anvil::Tools, all configuration files move to /etc/anvil and all programs and data files move to /usr/sbin/anvil. Words files are now merged, as are SQL schemas (ScanCore agents will still maintain their own, later). The journald tag has changed from 'an-tools' to 'anvil'.

Other changes;
* Tools.t has been updated to handle existing tests. New methods and parameters still need to have tests added though.
* Added a simple test.pl script used for testing things outside the main program. It will be removed before final release.
* Added the simple 'watch_logs' bash script to more easily tail output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2017-10-20 00:19:32 -04:00