Commit Graph

236 Commits

Author SHA1 Message Date
Digimer
4e9882812d * Fixed a bug where the periodic database dumps on the primary database Striker were not sync'ing to peers. Also fixed a bug where these periodic dumps weren't running at all.
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 23:18:06 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Madison Kelly
922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time.
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2021-09-16 23:10:55 -07:00
Digimer
a697011b08 * Disabled debug logging in anvil-daemon.
* WIP - working on new scan-network scan agent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-30 02:36:06 -04:00
Digimer
6777104398 * Fixed a bug in anvil-daemon where, when an anvil-manage-power reboot run had triggered a reboot, anvil-daemon didn't set the job_progress to '100', causing constant reboots. Also fixed a bug where the log level was hard-set to '1' instead of '2' needed during debugging.
* Updated Jobs->get_job_uuid() to accept the new 'incomplete' parameter that, when set, will look for jobs whose progress is > 1 and < 100.
* Updated ScanCore-agent_startup() to take the new 'no_db_ok' parameter which returns with '0' if no DB is available and that parameter is set to '1'.
* Fixed a logging bug in 'anvil-join-anvil'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-28 20:04:11 -04:00
Digimer
0c475d2a2e * Fixed a couple logging bugs.
* Updated scan-cluster to get the CIB from pcs instead of reading the CIB from disk.
* Updated anvil-daemon to always call striker-prep-database at log level 2 while trying to find the cause of rare postgres config failures. Also updated striker-prep-database to use the new method of initializing the DB.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 18:22:55 -04:00
Digimer
d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers.
* Completely disabled Network->check_network(), it's causing more problems than it solves.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 14:19:58 -04:00
Digimer
e7a06fce72 * Disabling the periodic network health check in anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 11:01:33 -04:00
Digimer
30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues.
* Fixed a undefined variable warning.
* Removed a debugging die from Database->resync_databases().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 19:41:00 -04:00
Digimer
47fa126a3c * Fixed a typo that blocked anvil-daemon from starting.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 19:00:26 -04:00
Digimer
023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly.
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 16:12:12 -04:00
Digimer
bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there.
* Added the 'print' parameter to Log->variables() to allow printing to STDOUT when set.
* Renamed Network->check_bonds() to Network->check_networks() in anticipation of adding bridge monitoring / repair to it later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 19:37:37 -04:00
Digimer
c7c6c8dee5 * Reworked the attempt to repair the network in anvil-daemon to not touch the network until the machine has been running for at least two minutes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-15 12:04:27 -04:00
Digimer
1e7847d4dd * Added a call to Network->check_bonds() to be called while non-Striker machines wait to connect to a database.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 14:14:37 -04:00
Digimer
3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on.
* Updated anvil-daemon to call Network->check_bonds() with 'all' on startup, then woth 'down_only' once per minute to try to heal down'ed bonds.
* Updated anvil-watch-bonds to take a 'run-once' switch and exit after one report, if set.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 13:33:51 -04:00
Digimer
19c41c9171 * Added more logging while chasing a function test bug.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-08 23:56:48 -04:00
Digimer
daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often.
* WIP - Continuing work on the new anvil-manage-server tool.
* Updated Database->get_anvils() to load information on the files available on each Anvil! system.
* Updated Database->insert_or_update_network_interfaces() to no longer take the 'timestamp' parameter.
* Removed all logging from Database->refresh_timestamp() to speed it up, given how often it will be called now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-08 15:23:15 -04:00
Digimer
96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged.
* Updated anvil-daemon to have a new function called "handle_special_cases" called during startup that does any weird bug mitigation required. For now, this is used to mitigate against rhbz#1961562, though certainly it will be used for other reasons later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 00:01:11 -04:00
Digimer
24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time.
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-03 22:25:36 -04:00
Digimer
4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped.
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-31 13:34:49 -04:00
Digimer
8807915bb7 The theme of this commit is database cleanup and fixes.
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-30 15:16:25 -04:00
Digimer
6abe06f125 The theme of these commits is improving DB responsiveness.
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-29 23:34:22 -04:00
Digimer
ff65712fd9 * Created the function check_daemons() in anvil-daemon to check that needed daemons are running when it starts. This was specifically added to address a periodic issue with machines booting without NetworkManager running.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 15:27:10 -04:00
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
Digimer
a846f9ecbc * Fix to the database resync logic. The previous change to only resync if 10+ lines differed broke striker-manage-peers as the difference in host counts is what triggered the pairing of strikers.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 12:25:29 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
3fb81c1a0a * Updated Convert->time() to silently return if the given time was '--'.
* Added a new parameter to Database->connect() called 'no_resync' that, if set, prevents a resync check being performed. Updated ->resync_databases() to find a uuid_column where the table name ends in 'ies' and the UUID column is 'y_uuid'. Updated ->resync_databases() to not fire on updated table age anymore, and to trigger only if the number of rows differ in a given table by more than 10.
* Updated Log->entry() to prefix a tool's name, when the new 'log::scan_agent' value is set. Also set this value in ScanCore->agent_startup(), to help differentiate log entries.
* Fixed a bug in scancore's main loop where it logged the sleep message at the start of the run.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-04 12:33:31 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Digimer
5f0b7740e2 * Fixed a typo that broke compiling anvil-daemon in the last commit. Yay for CI/CD!
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-10 01:28:12 -04:00
Digimer
fb0836f912 * THe get_cpu endpoint was completed.
* The get_mmeory endpoint was completed.
* The get_replicated_storage endpoint was completed, though it requires testing and likely has issues.

To prepare for the get_status endpoint work, I needed to update ScanCore and modules to track the host_status. This commit contains the work needed for this.
* Updated ScanCore->post_scan_analysis_striker() to use configured fence devices (except PDUs) to check if a target host is off or on, in there is no host_ipmi interface. In all cases, if a machine can be confirmed on or off, the host_status is now updated.
* To support the above fence based power checks, updated scan-cluster to store the on-disk CIB in the new scan_cluster -> scan_cluster_cib colume.
* Updated ScanCore->parse_cib() to map stonith primitive IDs to fence agents. Updated ->parse_crm_mon() to not call if the executable doesn't exist to avoid unhelpful error messages in the logs when called from a Striker.
* Update DRBD->gather_data() to get the size data from /sys/block/drbd<minor>/size' x '/sys/block/drbd<minor>/queue/logical_block_size so it works when a device is Secondary (and can't be promoted).
* Updated Database->get_hosts_info() to record the short host name as well as the stored host name. Created ->update_host_status() as a wrapper to ->insert_or_update_hosts() that only updates the host status.
* Updated anvil-join-anvil to disabled ksm and ksmtuned daemons.
* Updated scancore and anvil-daemon to set the host_status to 'online' on startup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-09 20:51:29 -04:00
Digimer
15fd0e5ce8 * Updated anvil-daemon (and Database->insert_or_update_jobs) to now recognize jobs with the job_status of 'scancore_startup' to run only when ScanCore starts.
* Finished initial Striker setup in tools/striker-auto-initialize-all. Started working on peering.
* Cleaned up the handling of converting UIDs to user names in Remote->add_target_to_known_hosts() and ->_call_ssh_keyscan().
* Did a bunch of white-space/alignment cleanup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-03-04 01:41:33 -05:00
Digimer
45a9cb04b0 * Fixed a bug introduced in the last commit that made Get->os_type() fail when called locally.
* Made the error reported by Remote->call() more verbose when called without 'target' being set.
* Updated anvil-daemon to not call jobs more that once per minute.
* Started work on striker-auto-initialize-all, still very far from complete.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-23 01:56:12 -05:00
Digimer
1b65f53faa * Remove host-health from the 'hosts' table as it wasn't needed, given the 'health' table. Bumped the SQL version to 0.0.2
* Updated Get->os_type() to use 'cat' instead of Storage->read_file() because 'rsync' may not be available when it is called during striker-initialize-host calls.
* Updated Database methods to skip 'oui' and 'state' during resync.
* Updatedb striker-initialize-host to detect when it's initializing a CentOS Stream Node / DR Host and enable the HA repo.
* Created the tools/striker-auto-initialize-all tool, which is very much incomplete, that will allow for the rapid creation of a full Anvil! from freshly installed machines autonomously.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-22 19:22:47 -05:00
Digimer
1a520b03d5 * Cleaned up a lot of logging in anvil-daemon and tools it calls.
* Deleted anvil-jobs as it never ended up being used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-02-08 13:39:34 -05:00
Digimer
d9d347ce63 * Updated .spec for the new source location.
* Created a log disable flag to avoid deep recursion when logging at level 3.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-22 00:37:30 -05:00
Digimer
cda51e562d * Finished porting scan-hpacucli, the last M2 scan agent!
* Updated Database->insert_or_update_temperature() to accespt a 'delete' parameter (which, surprise, deletes a record).
* Updated (but not yet tested) the RPM .spec to require that 'core' and the other three packages are required to be the same version.
* Updated scan-ipmitool and scan-storcli scan agents to now delete temperature data belonging to lost sensors.
* Fixed tools/striker-manage-install-target to add multiple missing packages.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-11-25 19:06:21 -05:00
Digimer
d677d19ca0 * Moved Database->check_condition_age to Alert.
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-23 01:28:21 -04:00
Digimer
1a1fa7ce88 * Created Cluster->get_anvil_uuid() that returns the 'anvil_uuid' of a given 'host_uuid'.
* Renamed the 'defitintions' table to 'server_definitions' to clarify the purpose, and made all the 'server' columns have then 'not null' constraint.
* Created Database->insert_or_update_servers(), ->get_servers(), ->insert_or_update_server_definitions() and ->get_server_definitions().
* Updated scancore, anvil-daemon, and scan agents to not run unless they're run with root privs.
* Got scan-server to update the servers / server_definition tables and the on-disk file when needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-28 00:20:13 -04:00
Digimer
dc5ec9c264 * Added checking the email server config to anvil-daemon. Email works now!
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-07 00:39:43 -04:00
Digimer
82acb4e104 * Fixed a resync bug where bridges needed to sync before bonds
* Re-enabled user-selected BCN subnet ranges (needs more testing).

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 22:52:38 -04:00
Digimer
49682a01d7 * Fixed a bug in Database->disconnect() where the database idenitification number wasn't being removed, so connecting again triggered the duplicate DB connection check.
* Fixed a bug in Tools->_set_defaults where the order the tables were sync'ed it caused primary/foreign keys would trigger DB errors when resync'ing in some cases.
* Created Database->log_connections to make it easier to log which databases are actively in use and other data about the connections.
* Fixed bugs in striker-manage-peers that (partly because of the above bugs) failed to connect to new peers properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-03 01:36:11 -04:00
Digimer
b2c7fd95fb * Renamed the ScanCore unit file to scancore.
* Added support to parsing location contraints to Cluster->parse_cib

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-24 12:54:31 -04:00
Digimer
c27cc7507f * Renamed striker-parse-fence-agents to anvil-parse-fence-agents and changed anvil-daemon to run it on all machines.
* Cleaned up a lot of logging.
* Updated Cluster->parse_cib() to track if a stonith device has 'delay' set.
* Got a lot more work done on anvil-join-anvil's stonith processing, but it still isn't complete. Updated it to change shell user passwords as well.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-06 01:20:53 -04:00
Digimer
1bf71f8428 * Updated Database->get_hosts() to run host_ipmi the Log->is_secure if the string contains 'passw'.
* Fixed Database->get_ip_addresses() to clear stale IP addresses.
* Finished (for now, more testing needed) System->configure_ipmi! Also created System->test_ipmi() that handles trying lanplus and various password lengths, updating hosts -> host_ipmi on successful check.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-24 16:22:27 -04:00
Digimer
597d9413a5 * Created the skeleton Cluster.pm.
* Got anvil-join-anvil to the point where is initializes and starts the cluster.
* Deleted the old ssh key handling logic in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-10 00:49:30 -04:00
Digimer
453f5c6223 * Fixed a bug where $anvil->nice_exit() was being passed 'exit' instead of 'exit_code' as a parameter.
* Update striker manifest run to add an entry into the 'anvils' table, and pass the anvil_uuid to the jobs rather than the various host_uuid's.
* Fixed a bug in the 'anvils' SQL procedure that copied data into the history schema (a few columns were missing).
* Updated anvil-configure-host to reboot when finished to be certain network changes have taken effect. Also updated the handling of virsh bridges to delete the autostart symlinks if libvirtd daemon isn't running.
* Added some logic to anvil-daemon to call 'anvil-update-states' with the -v{1,3} flag depending on the active debug level.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-24 00:39:56 -04:00
Digimer
4489111a65 * Fixed a bug in Job->clear() where it was not doing it's one job right.
* Updated System->generate_state_json() where when the full host name was short, it wouldn't set the short host name properly.
* Fixed a bug in 'tools/anvil-manage-power' where the node wouldn't mark the reboot as complete. Resolves issue #11.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-11 16:08:06 -04:00
Digimer
726a4374d1 * Renamed the database table 'host_keys' to 'ssh_keys' to better represent what it stores.
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-10 18:26:50 -04:00
Digimer
f71c16484a * Got the fence config confirmation screen working.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-03-12 00:25:17 -04:00
Digimer
818ef23634 * Moved the fences_unified_metadata file from /tmp, which apache can not read, to /var/www/html/.
* Fixed a bug (well, made a work-around for an issue without a known reproducer) where, on some occassion, a record will end up in the public table without being copied into the history schema. When this happens, the next resync would crash out because the resynd reads in the history table only. Now, when about to INSERT a record into the public schema during a resync, an explicit check is made to see if the record alread
y exists. If it does, the INSERT is instead redirected to the history schema.
* Cleaned up the fence agent metadata when displaying to a user, converting the shell codes to underline a string with square brackets instead. We also now replace newlines with <br /> tags. Lastly, to help fence_azure_arm's metadata description to display cleanly, a check is made to format the table correctly.
* Began work on the Striker menu for handling fence device management

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-20 23:41:01 -05:00
Digimer
7df405afcb * Created the manifest database table and Database->insert_or_update_manifests().
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-10 15:57:11 -05:00
Digimer
76e9352717 * Added a flag that tells anvil-daemon when a node is having it's network mapped. When this happens, open ssh connections are closed each loop and only tasks related to mapping the network run. This improves responsiveness in Striker when reporting which network links have come up or gone down.
* Fixed a bug in Database->insert_or_update_variables() where, if 'update_value_only' was set but not variable_uuid was passed or could be found, an (incomplete) INSERT would be attempted.
* Added support for generating module metadata when setting up local repos on Striker.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-01-09 01:52:04 -05:00
Digimer
6d81e03fb2 * Created Network->match_gateway() to check if a gateway applies to a given network.
* Got more work done on confirming the user's request to setup the network of a node or DR host.
* Reworked network select boxes to sort by the network name instead of the MAC address.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-08 00:19:13 -05:00
Digimer
86af67ecda * Created a cachine mechanism for anvil-update-states so that it can record network interface link state changes when it loses contact with all databases, as can happen when cycling NICs to map a newly build DR host or node.
* Updated Database->insert_or_update_network_interfaces() to take the new 'link_only' and 'timestamp' parameters to support flushing out the cache file above.
* Updated anvil-daemon to run anvil-update-states when the database connection is lost. Also moved the 'handle_periodic_tasks()' function call to be conditional on there being a database connection.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-05 01:59:41 -05:00
Digimer
530d379f59 * Started work on caching network state change in tools/anvil-update-states.
* Fixed a bug where ip_addresses could break resync when 2+ machines had the same IP (ie: 192.168.122.1).
* Updated logging of DB transactions to show the DB host's IP instead of the UUID.
* Updated Get->date_and_time to take a 'use_utc' parameter to return the time using GMT time instead of the host's TZ.
* Updated anvil-daemon to periodically call tools/anvil-update-states. Also upadted anvil-daemon to delay daily jobs by 2 hours except for the dashboard with the highest sorted UUID to minimize dual runs of tasks that only need to run once per day per cluster.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-04 00:02:19 -05:00
Digimer
e8d15112da * Fixed a bug in anvil.js where the state of a link always said 'up', even when it was down.
* Fixed a couple logging bugs in System->call().
* Fixed a bug in anvil-daemon where it was trying to setup setuid-C wrappers on non-dashboards.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-03 01:23:31 -05:00
Digimer
90f5bf49d5 * Updated Network->load_interfces() to only assign a 'changed_order' to real interfaces.
* Fixed a bug in System->generate_state_json() where interfaces connected to a bridge were constantly having their 'network_interface_bond_uuid' cleared and reset.
* Finished (for now) the jquery code to update the network interface list when preparing the network interface configuration of a new node or DR host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-12-03 00:30:26 -05:00
Digimer
e3a8c1a01d * Created System->generate_state_json() that reads, parse and writes out the network status of all known machines on a given Striker database.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-14 23:58:39 -05:00
Digimer
387c03aa7d * Added more text for the pending JSON state file generation.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-13 21:29:51 -05:00
Digimer
628f7faa45 * Updated the RPM spec file to generate '/etc/anvil/type.X' files to directly indicate the machine type. Updated System->get_host_type() to check for these files directly, falling back to parsing the host name if they don't exist.
* Created Database->get_hosts_info() (though it's not at all finished) that will write out a unified JSON file contain all data known about all hosts/Anvil! systems. This will be later used to create the WebUI parts.
* Also created, but also not finished, Network->load_interfces() that will work sort of like ->load_ups, but include all interfaces regardless of if they have an IP or not.
* Fixed a bug where the new bridge_interface_note parameter didn't exist in the Database->insert_or_update_bridge_interfaces() method.
* Updated anvil-update-states() to only write out the JSON/XML files if it's running on a dashboard. For nodes and DR hosts, it just needs to update the database.
* Created a new hook in anvil-daemon that will call tasks on a machine that is configured.
* As per RHEL 8.1 release notes, changed the package 'dnf-utils' to 'yum-utils' in the packages to load for install target repos.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-06 11:26:21 -05:00
Digimer
077977ad9c * Fixed tools/anvil-update-states so that it properly removed ip_addresses no longer assigned to a host. Also merged 'network::interfaces::by_name::${interface}' with 'network::local::interface::${interface}' for storing discovered interfaces.'
* Added 'ip_address_note' to the 'ip_addresses' table as there was no column convenient for flagging as DELETEd.
* Added 'uuid' to Database->insert_or_update_file_locations() and ->insert_or_update_files(), and actually used it in all ->inser_or_update_X() methods.
* Added 'delete' as a parameter to Database->insert_or_update_ip_addresses() to allow simple deletion of a referenced IP address.
* Addressed a few 'undefined variable' errors.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-11-02 00:02:36 -04:00
Digimer
af6e2c076d * Fixed a tricky deep recursion bug in Network->is_local when the passed in host was an empty string. Also created a cache system where a host name that has been checked before is immediately returned, without needing to run through the logic in 'is_local', which gets called quite frequently.
* Updated the loop detection logic in Log->entry where processing large strings was triggering it when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-20 22:55:49 -04:00
Digimer
32bcdbe6d3 * Removed Network->is_remote, standardized on Network->is_local, and flipped calls to it to be more sensible (is_local -> local call -> else remote call). Also fixed a deep recursion issue with ->is_local where, given that it logs (which calls Storage methods which have local/remote invocations), would loop.
* Fixed a bug where '$target' being preset to 'local' was causing bad calls to 'Remote->call'.
* Updated Storage->change_mode and -> change_owner to work locally and on remote hosts.
* Barely started work on striker->process_anvil_menu().

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-19 00:57:33 -04:00
Digimer
a7f93c59ea * Reworked striker-parse-oui and striker-scan-network to always lower-case the MAC address.
* Updated striker-scan-network to only run once per day unless --force or a given --network is used. This avoids repeated scans when the anvil-daemon restarts frequently for whatever reason.
* Fixed (for real this time) Convert->time's handling of the 'long' parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-18 11:40:38 -04:00
Digimer
4d0a02ce74 * Fixed a bug in Database->get_local_uuid() where ->is_local() was being called incorrectly.
* Added job parsing to tools/striker-parse-oui and tools/striker-scan-network, and enabled them in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-17 13:08:11 -04:00
Digimer
1d13e669a7 * Created the new tools/striker-scan-network tool that ping scans a network range and records the discovered hosts in the new 'mac_to_ip' table. Also created Database->insert_or_update_mac_to_ip() to handle the new table.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-16 23:23:02 -04:00
Digimer
7e960f1632 * Created the 'oui' database table that stores the parsed OUI data, as processed by the new 'tools/striker-parse-oui' tool. Also created Database->insert_or_update_oui() to handle inserts and updates.
* Fixed a bug in Convert->time() where the suffix was long when it should have been short, and vice-versa.
* Updated Network->download() to check if the target file exists and, if so, to abort unless 'overwrite' is given or the existing file is 0-bytes long. Also updated it to not exit on immediate error after the wget call and instead check to see if a zero-byte file exists and remove it, if so.
* Created Validate->is_hex() to check hexadecimal strings.
* Updated Words->clean_spaces() to remove MS-DOS-style ^M cr/lf characters.
* Updated anvil-daemon to have a section for periodic tasks that run daily, and added striker-parse-oui as well as moved striker-manage-install-target refresh to that check. Also made those tools run on dashboards only.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-16 01:57:59 -04:00
Digimer
183d2d9cce * Updated Storage->backup to append a short UUID to the timestamp to prevent issues if the same file is backed up twice in the same clock second. Also fixed a bug with remote_user parameter not having a default.
* Finished the detection of and handling of initialization of a host when the host has no Internet access.
* Disabled (for now) anvil-daemon's check_ssh_keys function.
* Fixed a couple small bugs elsewhere.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-04 22:36:39 -04:00
Digimer
3a86bed694 * Fixed tools/striker-initialize-host so that it set the hostname on the target, not locally.
* Updated System->host_name to work locally and on remote targets.
* Renamed all 'hostname' instances to 'host_name' to standardize on a spelling throughout the program.
* Removed use of and dependency on 'hostname'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-02 15:39:21 -04:00
Digimer
37f36fe99c * Updated kickstart to write the basic tools/anvil-update-issue to a freshly installed machine and run it from cron.
* Updated Remote->call() to detect when a connection fails because the target's known_hosts entry has changed. Still need to add the function to report this to the user.
* Fixed a bug where new-lines in Words->parse_banged_string() where a double-banged word string's variable value would cause an infinite loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-24 23:35:31 -04:00
Digimer
b9a0cc4d56 * Finished the initial tools/striker-initialize-host!
* Created Tools->refresh to reload anvil.conf in one call.
* Created Anvil::Tools::Network to hold network-related tasks.
** Created Network->is_remote() that tests to see if a string (containing a target) refers to the remote machine (versus a local machine). Updated all previous checks to use this new method.
** Moved Get->network_details() and Get->network() to the new Network module. Renamed Get->network() to Network->get_network().
** Made Network->get_ips() work locally and remotely.
** Created Network->find_matches() that compares two scanned machines IPs (via two previous calls to Network->get_ips())
* Created Database->manage_anvil_conf() that will add, update or remove a given database connection in a local or remote anvil.conf file.
* Fixed bugs in Storage->backup() where the bash calls were quite broken. I'm not sure how it ever worked before... x_x
* Updated anvil-daemon to not initialize a database unless it's running on dashboard. Also added a check at the startup of anvil-daemon where it will go into a loop waiting for a database to become available, re-reading anvil.conf each loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-22 23:36:59 -04:00
Digimer
badaa39b7a * Got the node/dr host initialization form to the point where it can test access and decide if it should show the Red Hat account form. Decided that for M3, node/dr host setup will now be a four-stage process; initial install (over PXE), initialization (install the proper anvil-{node,dr} RPM and connect to the database), setup/map the network, and then add to an Anvil! pair.
* Updated striker to no longer try to SSH to a remote machine. To enable this, we'd have to give apache a shell and an SSH key, which is dumb and dangerous when considered.
* Created tools/striker-get-peer-data which is meant to be invoked as the 'admin' user (via a setuid c-wrapper). It collects basic data about a target machine and reports what it finds on STDOUT. It gets the password for the target via the database.
* Updated anvil-daemon to check/create/update setuid c-wrapper(s), which for now is limited to call_striker-initialize-host.
* Created Anvil/Tools/Striker.pm to store Striker web-specific methods, including get_peer_data() which calls tools/striker-initialize-host via the setuid admin call_striker-initialize-host c-wrapper.
* In order to allow striker via apache to read a peer's anvil.version, which it can no longer do over SSH, any connection to a peer where the anvil.version is read is cached as /etc/anvil/anvil.<peer>.version. When Get->anvil_version is called as 'apache', this file is read instead.
* Updated Database->resync_databases() and ->_find_behind_databases() to ignore the 'states' table.
* Created tools/striker-initialize-host which will be called as a job to initialize a node/dr host.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-16 00:17:02 -04:00
Digimer
6f74ca376b * Created anvil-daemon->check_setuid_wrappers() function that will dynamically create setuid c-wrappers on daemon startup, when needed.
* Updated variable names to clarify their purpose in striker.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-11 12:58:57 -04:00
Digimer
bc341809ca * Finished (for now) ocf:alteeve:server! It can boot, migrate and stop a server cleanly. It still checks to see if DRBD needs to be started and does so when needed, but it won't stop it anymore.
* Fixed a couple typos in tools/anvil-check-memory that prevented it from running.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-04 18:58:43 -04:00
Digimer
9c0f6b8f79 * Added automatic 'echo return_code:$?' to System->call and Remote->call which is parsed out and returned automatically on all calls.
* Started porting ocf:alteeve:server to use the Anvil::Tools module and updating it for RHEL 8.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-13 04:16:03 -04:00
Digimer
e55594f58f * Notes updated with working network config for RHEL 8 proper. Two notes; Creating a BCN bridge by default, and switch the DR third octet to 12 (13 for IPMI) and fourth octet to sequence number.
* Fixed a bug in System->get_ips() where DHCP-assigned IPs were not being parsed properly to get the default gateway.
* Added the alteeve-el8-repo to the kickstart files install package list.
* Updated anvil-daemon to sleep 2 seconds between loops, instead of 1. Added a check to 'check_firewall' to not run until after the system has been configured.
* Quieted a lot of logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-05 01:36:21 -04:00
Digimer
188ecdbbd7 * Improved handling of missing RPMs when downloading RPMs for tools/striker-manage-install-target's repo.
* Updated package list to fix changed dependencies from RHEL 8 beta to final.
* Changed anvil_daemon to only check DHCP once per minute instead of every loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-06-28 00:37:08 -04:00
Digimer
27ba3dcbb9 * Created Database->read() to store and return the handle to whichever database is used for read operations. Also created Database->quote that uses ->read to access the DBI 'quote' method more cleanly. Updated all calls to use these new methods.
* anvil-manage-files now identifies peers on the same subnet(s) and stores them in a sortable hash.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-03-06 01:49:59 -05:00
Digimer
040f189ea6 * Finished (barring bugs) the SSH handling in anvil-daemon. Now, keys added to the database (machine and user's) will auto-propegate out to any other machines in the cluster (all machine types).
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-03-05 01:26:21 -05:00
Digimer
ff5ef43940 * Continued work on the ssh configuration system in anvil-daemon.
* Created Database->insert_or_update_host_keys().

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-03-01 01:42:12 -05:00
Digimer
80e7bc5ce0 * Started work on a system to provide inter-machine ssh communication without needing to track or record passwords in the database or config files (outside the database access passwords). Added 'host_key' to the 'hosts' table that stores the host public key. Also now create ssh public/private key pairs for the 'root' and 'admin' users.
* Fixed some bugs in anvil-update-states so that bonds are recorded properly in the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-27 00:23:03 -05:00
Digimer
0979402ecf * Improved handling of failed connection attempts to remote machines in Remote->call();
* Started work of "Files" (replacement for the media library), including database tables, planned sync flow and web UI.
* Added a check for the /mnt/shared directories and create them as needed in the periodic anvil-daemon checks.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-07 03:26:12 -05:00
Digimer
d240f3ae2e * Created files and prep-host icons.
* Renamed a couple Striker-only tools to use the 'striker' prefix instead of 'anvil'.
* Updated the core_tables list.
* Renamed 'sys::log::main' to 'sys::log::file'.
* Fixed some "Back" and "Refresh" links.
* Started planning out the file sync system.
* Started work on the Anvil! setup / host prep system.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-02-06 02:36:19 -05:00
Digimer
3c980a5c6d * Created Job->running() to return '1' when one or more jobs are in progress on the host.
* Started work on a "Jobs" button on the Striker UI to be able to see the progress of jobs that are running in the background.
* Updated the Help icon and added the jobs (tasks) icons.
* Made logging around dhcpd more verbose to help figure out why it's auto-running after initial configuration.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-31 23:26:55 -05:00
Digimer
2c23c6beba * Improved infinite loop handling in Log->entry, but broke the Striker UI in the process. To be fixed next...
* Added a 'test' parameter to Log->entry, Storage->make_directory and Words->key to help debug in places that Log->x may not be usable.
* Converted many $anvil->Log->x calls to print if $test to help prevent recursive loops, but not all fixed yet.
* Added the new 'host_keys' database table to the schema for a possible new feature of removing passwords in favour if machines adding peers' public keys to their authorized_hosts file.
* Cleaned up the opening calls to $anvil->Tools->new() in most tools.
* Cleaned up some variables in tools/anvil-update-states after reading their values from files (clean trailing newlines).

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-18 03:19:36 -05:00
Digimer
d2c812ee03 This commit starts the move to RHEL8 from Fedora 28.
* The 'notes' file has a lot of RHEL8 migration notes added, including RPM build orders.
* The anvil.spec file has switched the source from 'master.tar.gz' to 'anvil-3.0b.tar.gz' and moved the source to our webserver. Updated the dependencies as well.
* Updated anvil.sql to add the 'anvils' table and fixed some SQL schema problems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-01-17 16:02:57 -05:00
Digimer
8fad67fc5a * Updated Words->read to default to 'path::words::words.xml' when the 'file' parameter is not passed. Also updated it to check to see if the words file was read before and, if so, clear the data from the previous read before re-reading it.
* Updated anvil-daemon to re-read the main words file on each loop.
* Updated scancore to read and purge each scan agent's words file between invocations.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-28 19:06:26 -04:00
Digimer
946fce018a * Renamed the 'ScanCore' executable to just 'scancore', moved it into the standard 'tools' directory and changed the agents directory to '/usr/sbin/scancore-agents'.
* Got scancore scanning the agents directory, and properly holding on startup until at least one database is available (instead of exiting), and holding on startup until the local system is configured.
* Created the skeleton of the first scan agent; scan-network.
* Fixed a bug in Storage->check_md5sums() where dynamically loaded modules, loaded after the initial md5sum calcs, would cause the calling daemon to exit (possibly on every invocation).
* Created the scancore.README that will eventually be the main scan agent guide / API document.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-28 18:17:59 -04:00
Digimer
f5ae90c941 * Started work on M3 ScanCore!
* Started expanding Alert->register_alert() to actually implement it.
* Improved handling errors in Words->key().
* Started work on Striker's "Anvil!" menu section. Also cleaned up the power handling.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-18 15:20:49 -08:00
Digimer
2fa4048780 * Updated anvil.conf to default-enable various defaults. Also dropped the archive thresholds.
* Fixed a bug in the PXE default config path to install.img.
* Added tftp to the BCN firewall template.
* Fixed a bug in anvil-daemon / striker-manage-install-target where config files weren't being updated regularly (only when repo updates happened).
* Removed an RPM from striker-manage-install-target that is no longer available on F28.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-12 00:10:14 -05:00
Digimer
5f77ff5885 * Finished (for now) anvil-manage-firewall. It's been added to anvil-daemon as well.
* Updated Log->entry() to accept 'print => [0|1]' to send a log message to STDOUT (minus prefix) to avoid tools that were repeatedly calling print and Log->entry back to back.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-11 02:27:55 -05:00
Digimer
b5dc83d39c * Renamed anvil-configure-striker -> striker-configure-host and anvil-manage-install-target -> striker-manage-install-target as they're both Striker-specific.
* Fixed a bug in Words->parse_banged_string where some variable strings were not being cleared, causing infinite loops.
* Added job progress reporting in striker-manage-install-target, and made it only refresh the RPM repo when '--refresh' is specified (with --force now forcing the issue). This was done to allow adding it into anvil-daemon in such a way that it would only update the RPM repo once a day.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-06 04:14:58 -05:00
Digimer
4b66379aaa * Added enabling/disabling 'Install Target' feature to Striker's WebUI.
* Fixed a bug in Get->anvil_version where the version of local systems and remote systems differed in closing new lines.
* Fixed a bug in Database->insert_or_update_variables() where the 'debug' parameter wasn't working.
* Renamed System->determine_host_type -> System->get_host_type.
* Fixed a bug in System->get_uptime where there was a newline after the uptime integer.
* Updated anvil-daemon to track and record the state of the Install Target feature on Striker dashboards.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-12-04 04:16:38 -05:00
Digimer
07c3b405ad * Starting work on adding "Install Target" function (will likely rename this, but basic same function as IT in m2).
* Added 'sys::database::failed_connection_log_level' to allow silencing of log messages when a Striker peer database is not available.
* Started updating the .spec for the new release to add supported packages needed for PXE/dhcp/tftpboot.
* Added to repo tftpboot files as pulling them out of the packages and moving them into the right place relative to the modest size of adding them directly to our source wasn't justified.
* Created the still very very early 'tools/anvil-manage-firewall' tool.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-10 19:43:23 -04:00
Digimer
6f0bc0d86f * Fixed a major bug where anvil-daemon would reset the job_progress to 0 when clearing the 'reboot_needed' flag, causing anvil-daemon to pick the job up and again reboot, repeatedly.
* Updated Jobs->update_progress to take 'picked_up_by' as an optional parameter, defaulting to '$$' (the caller's PID).
* Created System->get_uptime() to return the current uptime in seconds.
* Added a delay to anvil-manage-power to not proceed with a reboot if the uptime is less than 600 seconds. This way, if any future bug causes an infinite reboot, there will be more time to determine what's wrong and debug the system between reboots.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-06 03:16:08 -04:00
Digimer
46916b658b * Fixed the spec now that anvil.sql is in the right place and quieted anvil-daemon system check logging.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-04 21:38:04 -04:00
Digimer
510321d634 * It looks like adding and removing Striker peers (and all the sync'ing stuff behind it) is finally sorted out. Obviously, time will tell for sure, but currently things look good.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-04 20:54:00 -04:00
Digimer
43035ba038 * Fixed a subtle an annoying autovivication bug in Database->write().
* Cleaned up some logging.
* Made the "Reload" buttons work more sensibly and cleaned up some webui display stuff.
* Got deleting peers mostly working (well, it works, but then it goes into a loop thinking it needs to resync the now-gone database until the daemon restarts).
* Fixed a race condition bug where if a job exited between the time that anvil-daemon got a list of PIDs and when it checked to see if that specific pid was alive, a job that actually completed could be restarted.
* Added a loop check to anvil-manage-striker-peers where it would hold until a database connection to the newly added peer was available, preventing a condition where re-adding a peer (and so the host_uuid is in hosts) cause the job belonging to the peer to be recorded locally and then never synced to the peer.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-04 03:37:43 -04:00
Digimer
e79e7fd4f4 * Added 'check_if_configured' to Database->connect(), disabled off, that triggers the check to see if the system is configured or not. Updated anvil-daemon to invoke this at the same time that the md5sums are calculated to see if a reload is needed. This reduces the background system load a fair bit.
* Got more work done on deleting peers from Striker (technically done, but untested so far).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-10-02 03:31:42 -04:00
Digimer
facefeaccc * Fixed a bug in anvil-daemon where completed jobs could be immediately cleared, causing them to re-run (repeatedly).
* Added 'sys::log_date' which controls if the date and time is pre-pended to log entries.
* Created Get->host_name() which takes a host UUID and returns the 'host_name' from the 'hosts' table, if found.
* Cleaned up some HTML templates and logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-28 01:50:38 -04:00
Digimer
e15fd19ee4 * Fixed a bug in anvil-daemon where a stray 'die' from earlier work was left in.
* Fixed a bug where a double-equal in anvil-configure-striker was causing it not to compile and run.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-27 01:54:30 -04:00
Digimer
42c4cd01f9 * Made logging a bit more verbose for job processing in anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-26 00:23:07 -04:00
Digimer
9bd5dd9a18 Revert to bfc2204.
This reverts commit bfc2204352.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-25 02:05:07 -04:00
Digimer
a8369170b4 This is the start of a major change!
The resync of the databases was originally designed (on m2) with the expextation that any given column would have only one change per 'modified_date' time. That was never a great approach, but it worked in m2 and just bit me on m3. With job processing, for an example, the job_progress will change repeatedly in one pass, all with the same 'modified_date'. So only one record per run would resync. To fix this, the plan is to drop 'history_id' (and the procedure/trigger in pgsql to copy INSERT and UPDATEs to the history schema). The new plan is to use 'change_uuid' with a per-transaction UUID created in Database so that the per-DB 'history_id' is replaced with a per-update/insert UUID in 'change_uuid'. This will become the unique record used to sync databases, instead or 'modified_date'. To keep things consistent, 'modified_date' was renamed to 'change_date' to match 'change_uuid'. This work is very much "in progress" and not finished.

This commit also changes Get->uuid to use UUID::Tiny to create v4 UUIDs instead of making making a system call to 'uuidgen'. This sped up UUID generation by almost 100x.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-23 16:16:08 -04:00
Digimer
bfc2204352 * Added a row-count check when deciding if a DB resync is needed.
* Updated the Database module to not sort or reorder the 'core_tables' array, and reordered them in the hash they're declared in.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-22 00:54:37 -04:00
Digimer
e67828b6c6 * Deleted stray exit used in debugging anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-21 13:51:34 -04:00
Digimer
40aac1d5f6 * Finished adding the 'sessions' database table and associated code.
* Added a check to all 'Database->insert_or_update_*' methods to check if the passed-in reference UUID was found and return an empty string if not.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-21 13:49:28 -04:00
Digimer
7bd65f65e5 * Finished the first round of updates to anvil-manage-striker-peers, but the initial resync is failing because of unrelated schema issues.
* Updated Database->insert_or_update_jobs() to also use the job_command when looking for an existing job (when a specific job_uuid was not included).
* Fixed a bug with a missing ? in striker->add_sync_peer function. Also updated it to not try to record the peer's job as it is unlikely the peer will be in hosts. Instead, the job_command to add the peer is appended to the local job't job_data and the updated anvil-manage-striker-peers looks for that at the end of the add and sync, and records the job once the peer's UUID is in 'hosts'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-13 02:22:35 -04:00
Digimer
4bf048054b * Udated anvil-daemon to not use Time::HiRes for now, and added a timer so that the md5sum of files used by the daemon are checked only once per minute. This significantly reduced the load caused by the daemon running.
* Bumped the RPM spec file to 15, though haven't actually rolled the new RPM yet. Also added 'htop' as an anvil-core dependencies.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-10 02:17:02 -04:00
Digimer
39c94009e6 * Created System->check_if_configured and then used that to only have anvil-daemon call update_state_file() when the system is unconfigured (to reduce the laod) when it's usually not needed).
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-07 22:55:05 -04:00
Digimer
94d8a9c495 * Fixed a bug where finished jobs with a '0' picked-up time would be written to jobs.json.
* Updated anvil-configure-striker to use Job methods and reboot using anvil-manage-power. Also updated it to set/clear maintenance mode and mark a reboot required at the end of it's run just prior to reboot.
* Lots of log cleanup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-07 01:29:43 -04:00
Digimer
00565b123c * Updates Tools->nice_exit to add the caller name to the exit status.
* Created Job->clear() to clear the job_picked_up_by column. Created Job->get_job_uuid() to return the job_uuid of an unfinished job matching a given job_command string (if any found).
* Updated striker->process_power to log the user out after confirming a poweroff or reboot action.
* Added anvil-daemon --startup-only to not enter the main loop and exit.
* Finished getting poweroff and reboot working (though more testing needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-06 01:37:08 -04:00
Digimer
b7e4ba9123 * Made the detection of whether a system has been rebooted a lot smarter, thanks to an idea from Lisa Seelye (@thedoh).
* Got the webui portion of requesting a poweroff and reboot done, but still working on finishing anvil-manage-power (work on which lead to the above improvement).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-05 03:39:13 -04:00
Digimer
545f9a9bb5 * Renamed tools/anvil-reboot-needed to tools/anvil-manage-power and started adding support for rebooting and powering off to it.
* Created the Anvil::Tools::Jobs module to handle general job processing task. Moved 'update_progress' from tools/anvil-update-system to it and generalized it.
* Added some missing CDATA wrappers to the words XML file strings with '>' in it.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-04 18:57:09 -04:00
Digimer
962ff89fc5 * Fixed a bug in Words->parse_banged_string() where values with commas was breaking the processing of the string of variable/value pairs.
* Added '--refresh-json' to anvil-daemon that auto-selects '--run-once', '--main-loop-only' and '--no-start'.
* Updated anvil-update-system to not go more than a second between updates to the progress (save for when we're holding on data from 'dnf').

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-03 02:35:25 -04:00
Digimer
b656af88c7 * Added a TODO
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-02 22:13:41 -04:00
Digimer
6210524780 * Fixed a bug where long-finished jobs where being displayed in the Striker maintenance mode display.
* Made it so that anvil-daemon won't restart when on-disk version has changed while jobs are still running.
* Made it so that anvil-update-system reloads systemctl after the update finishes to pickup changes in updated system daemons.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-09-01 00:59:14 -04:00
Digimer
12073ffa08 * Added --no-start to tools/anvil-daemon to allow for updating just the jobs.json file. Fixed a bug where a comma was missing when 2+ jobs were written to the JSON file as well.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-31 16:52:53 -04:00
Digimer
423fda2ad6 * Fixed a bug in anvil-daemon where rebooting was clearing the reboot-needed flag.
* Finished (for now) adding support for monitoring jobs while a node is in maintenance mode!
* Cleaned up the display of job data and redid how buttons (real and classed links) are displayed to be consistent.
* Fixed a bug in anvil-daemon where a disconnect wasn't being called between loops, causing DB connections to pile up.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-31 02:40:49 -04:00
Digimer
eecef192b3 * The job progress display in Striker while in maintenance mode is coming along and mostly working now.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-29 19:52:05 -04:00
Digimer
eaca4c885f * Updated tools/anvil-daemon to use the JSON module to build the JSON strings going into jobs.json, instead of doing it manually.
Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-29 00:33:33 -04:00
Digimer
831ff14d93 * Created Words->parse_banged_string to process strings in the format '<key>[,!!var1!value1!!,!!var2!value2!!,...,!!varN!valueN!!'. Still testing this.
* Made anvil-update-system look for a job_uuid when none is passed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-28 22:42:46 -04:00
Digimer
60584b8cee * Created Database->get_jobs() to be a more general way to retrieve pending and recently finished jobs.
* Started adding the display of running and recently finished jobs to Striker when in maintenance mode. Still lots to do.
* Started working on the logic for what will soon be Words->decypher_string in anvil-daemon to process strings stored as '<key>,!!<name1>!<value1>!!,...,!!<nameN>!<valueN>!!'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-28 01:54:49 -04:00
Digimer
bd862b2e5e * Made more progress on anvil-daemon's invokation of jobs.
* Got anvil-update-system clearing job data when (re)starting

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-27 02:51:53 -04:00
Digimer
633da25d07 * Fixed a bug in Database->connect where an empty 'database::<key>' would cause an error. Updated Database->disconnect to delete the 'database' hash key as part of the same fix.
* Renamed 'database::locking' to 'sys::database::locking' to avoid collisions with 'database' keys.
* Fixed a problem with System->call where reidrects were missing the Proc::Simple method name.
* Updated anvil-daemon to check if there is no database connections on start-up, run prep-database if not, and try connecting again. If it still fails, exit. Also updated the main loop to reconnect to the database(s) and skip if non are available. Did more work on the keep_running() function.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-24 00:52:56 -04:00
Digimer
d147e10552 * Changed the call to reboot after OS update to instead set a reboot required flag.
* Created tools/anvil-clear-reboot to clear the "reboot needed" flag. Also created, but not yet using (and may not use) units/anvil-boot-time.service.
* Started work on having jobs show their data via JSON / jquery.
* Updated anvil-update-system to record messages indicating the progress so far.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-22 02:16:56 -04:00
Digimer
6aa74d3d96 * Updated Database->_test_access() to use the DBD 'ping' method, and attempt a reconnect of failure.
* Updated Database->connect to take a specific UUID to attempt a connection to.
* Renamed some old 'sys::x' variables related to the database to 'sys::database::x' to conform better to coding standards.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-15 16:57:57 -04:00
Digimer
dd88051d9b * Finished the initial version of tools/anvil-update-system.
* Updated the RPM to .13 to disable postun's disabling of postgres, which breaks Anvil! software using the database during RPM updates.
* Fixed a logging bug where the number of DB connections was not inserting the number properly.
* Fixed exits in tools/anvil-prep-database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-14 03:45:36 -04:00
Digimer
0fa3c42f2f * Fixed a bug where setting the debug level to 3 caused a deep recursion and a system hang.
* Update Anvil::Tools->new() to access the parameters 'log_level', 'log_secure' and 'debug', streamlining the frequent calls to $anvil->Log->level and ->secure in program startup, and allowing the values to take effect during the ->new constructor.
* Passed 'debug' to child method calls in more places (still more to do though).
* Fixed a bug where 'test_table' wasn't set in the right place, causing the database to try to initialize repeatedly.
* Made Database->archive_database only run if called with root access.
* Now the number of database connections are stored in 'sys::db_connections' instead of checking the returned number, and that is cleared on disconnect.
* Started working more on 'anvil-daemon', including adding support for System->call being taking 'background', 'stderr_file' and 'stdout_file' paramters which, when set, used Proc::Simple to background the process.
* Did some more work on database archiving, though still far from done.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-08-01 02:06:16 -04:00
Digimer
a294c6c4fa * Updated the database components to use the name 'anvil' and the user 'admin'. The 'database::user' and 'database::name' variables are still supported, but now hidden.
* Fixed a bug where some '$anvil->{}' variables should have been '$anvil->data->{}'.
* Started merging message keys on 'error_xxxx', 'warning_xxxx', etc.
* The anvil-configure-network now configures the network. Commented out, the tool can reconfigure the entire network without a reboot, but a current issue with the post-configured system refusing to use the allocated interface as the default gateway is to be reviewed at a future time. For now, a closing reboot will be issued.
* Started creating 'anvil-change-password' that will update passwords, including apache (and configure .htpasswd when needed).

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-04-13 19:55:34 -04:00
Digimer
d6846841a2 * Added the 'job_status' column to the 'jobs' table where progress to be shown to users is stored.
* Updated anvil-configure-network to use Database->insert_or_update_jobs().

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-04-02 01:03:28 -04:00
Digimer
c21b326f1a * Changed all methods to take a 'debug' argument for setting log level on calls.
* Fixed a bug with resync, but others remain as resync is incomplete (at least for network_interfaces).
* Currently, tools/anvil-update-states is broken while working on the above issue.
* Reworked the jobs table and removed the units/anvil-jobs.service unit. Jobs will be invoked and backgrounded in all calls.
* Started adding missing hidden form fields.
* Updated the 'server' OCF resource agent version and metadata.

Signed-off-by: Digimer <digimer@alteeve.ca>
2018-03-07 03:11:55 -05:00
Digimer
9648e8ba43 * Create tools/anvil-jobs and units/anvil-jobs.service, which is a new daemon that will handle jobs that can take some time to finish.
* Created Storage->record_md5sums() and Storage->check_md5sums for use in daemons. These will record the md5sums of the program itself, all perl modules and the words file. When check_md5sums is called, it returns '1' if any sums have changed, which daemons can trigger on to exit (and systemd will restart them). Removed the basic md5sum check from anvil-daemon and switched to this.
* Fixed how 'fatalstobrowsers' is invoked so that it only applies to programs running in a browser.

Signed-off-by: Digimer <digimer@alteeve.ca>
2017-12-08 17:04:36 -05:00
Digimer
bb48c090a7 * Created Get->md5sum() to return the md5sum of the specified file.
* Updated anvil-daemon to exit if the md5sum on disk changes.
* Quieted a lot of logging.

Signed-off-by: Digimer <digimer@alteeve.ca>
2017-12-07 18:42:48 -05:00
Digimer
2b9c6c26dc * Fixed a couple remaining issues from the recent merger. Specifically, '$$anvil' was fixed from a bad regex and the path/names of our tools were fixed.
Signed-off-by: Digimer <digimer@alteeve.ca>
2017-10-20 11:13:00 -04:00
Digimer
1cb42080c3 ** Major Changes **
We've decided to give up on trying to keep ScanCore, AN::Tools and Striker as three separate things. We had originally hoped to make ScanCore easily separatable from the Anvil!, but this was adding increasing complexity to the project and complexity is the enemy of reliability.

In this release, AN::Tools becomes Anvil::Tools, all configuration files move to /etc/anvil and all programs and data files move to /usr/sbin/anvil. Words files are now merged, as are SQL schemas (ScanCore agents will still maintain their own, later). The journald tag has changed from 'an-tools' to 'anvil'.

Other changes;
* Tools.t has been updated to handle existing tests. New methods and parameters still need to have tests added though.
* Added a simple test.pl script used for testing things outside the main program. It will be removed before final release.
* Added the simple 'watch_logs' bash script to more easily tail output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2017-10-20 00:19:32 -04:00