Commit Graph

704 Commits

Author SHA1 Message Date
Tsu-ba-me
c268d345fc fix(tools): allow striker-access-database to process and execute DB subs 2022-03-18 22:50:40 -04:00
Tsu-ba-me
3e8baf4b3e chore: add dev tool to generate dispatch table from perl modules 2022-03-18 22:50:40 -04:00
Tsu-ba-me
e909a715a3 feat(tools): add cli tool for accessing database with SQL query 2022-03-18 22:50:40 -04:00
Digimer
8fbf594002 Updated striker-prep-database to stop -> start postgres post-configure, and to connect -> disconnect to run the schema load logic.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-16 13:59:21 -04:00
Digimer
edf51adaec * Changed 'anvil-manage-power' to no longer set the job progress to 50 prior to calling a reboot. It now sets to 100 immediately. Also reduced the uptime timer to five minutes from ten.
* Updated striker-auto-initialize-all() to reconnect to DBs during waits to better detect when a DB is marked as offline.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-16 00:35:26 -04:00
Digimer
7b090e1623 * Updated Database->shutdown() to disconnect, stop the postgresql daemon, then reconnect.
* Updated anvil-daemon to not stop a database until both/all DB hosts are in both/all DB's hosts table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-15 22:33:42 -04:00
Fabio M. Di Nitto
0ebe589c93 Ship striker-db-status
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2022-03-15 05:50:32 +01:00
Digimer
513ce3b74e Created 'striker-db-status' that reports the status of the databases to external tools. It's basic, but it works.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
3fd0db15bf * This rather heavily reworks how database shutdowns works. It adds much more intelligent shutdown, tracking who is using the database, being able to mark a database as "offline" and waiting for users of the database to disconnect before it shuts down.
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
b234b79544 Updated anvil-daemon to check if anvil-sync-shared is running if the reported RAM use is too high. If so, it doesn't exit. This fixes an issue where anvil-sync-shared would loop forever as it would constantly be killed when downloading large files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-27 21:29:30 -05:00
Digimer
68b1d12545 Updated anvil-daemon to not shutdown a striker DB until the striker host has been running for at least an hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-24 22:25:37 -05:00
Digimer
763821a21d Fixed a variable substitution bug.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-24 22:25:37 -05:00
Digimer
87a2454a09 Moved anvil-configure-host reboot logging to use log_0687 to help grep for reboot causes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-21 19:18:28 -05:00
Digimer
dc989f0950 Added more logging to track when and how reboots happen in systems.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-16 21:55:33 -05:00
Digimer
f77f486775 Fixed a typo in scan-network
Fixed a missing 'next' to prevent the first DB from disconnecting when down'ing excess DBs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-09 15:52:21 -05:00
Digimer
d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits.
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-05 22:08:06 -05:00
Digimer
a633ab7f63 Added a periodic check to ensure all users can ping. This fixes a bug where a local striker dashboard whose DB was stopped wouldn't work.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-21 14:27:58 -05:00
Digimer
e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-20 14:13:27 -05:00
Digimer
892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same.
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-18 02:38:50 -05:00
Digimer
796814531e Fixed a bug in Alert->check_condition_age() where, when the 'clear' parameter was set and the value was already 'clear', it would flip to 'set' erroniously.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-13 21:07:25 -05:00
Digimer
0a9f81d852 * Created the new striker-db-report that shows how many records are in each database table, and if passed a table name, report how many times each record has been recorded in the history schema.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-13 20:00:37 -05:00
Digimer
652f87ec74 * Updated scan-network to also clean up the media type.
* Updated anvil-daemon to check for files in /mnt/shared/incoming on striker dashboards and add them to the media library if needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-12 23:27:44 -05:00
Digimer
72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces.
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-03 17:06:56 -05:00
Digimer
3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system.
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-07 20:03:39 -05:00
Digimer
9cfd7b9b94 Created the new (and still in development) striker-file-manager to manage files from a Striker dashboard's command line. So far. it will add files only.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-01 18:43:50 -05:00
Digimer
65dfc22a38 Added an eval{} call around Database->query()'s ->prepare() DBI call to better handle lost database handle.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-30 17:58:29 -05:00
Digimer
521633f3eb
Merge branch 'master' into anvil-tools-dev 2021-11-25 01:50:20 -05:00
Digimer
75a4c8d709 * Moved the logic to add the local database to a Striker's anvil.conf from striker-prep-database to Database->_add_to_local_config().
* Updated striker-prep-database to always set the user's password, independent of whether the database user was created.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-25 01:47:55 -05:00
Fabio M. Di Nitto
36bcaa587f [build] fix FASEXECPREFIX handling and ship fence_ in -core rpm
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-11-24 21:58:16 +01:00
Digimer
958267e38f * Enabled scancore in the .spec file. Disabled calling striker-prep-database and anvil-update-state in the same.
* Updated striker-prep-database to check / wait until postgresql-server is installed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 01:22:33 -05:00
Digimer
034c38fdeb Disabled calling striker-prep-database from the spec file, and enabled scancore.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 00:55:02 -05:00
Digimer
6225ce1943 Updated striker-prep-database to not configure the firewall if firewalld isn't running.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 23:32:07 -05:00
Digimer
8e41814ca2 * Updated anvil-daemon->prep_database() to start the postgresql daemon if it's not running and no databases are available.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 21:49:24 -05:00
Digimer
b517117bc1 * Did more work on trying to figure out why iniital setup of the database was failing. I believe it was because, in anvil-daemon, after calling 'prep_database' we called ->connect() _without_ 'check_if_configured' set. Next round of function testing should help confirm is this was the case.
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 20:41:29 -05:00
Digimer
090c59a873 Updated striker-prep-database to enable extra logging to help diagnose a function test build failure problem.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-16 03:21:33 -05:00
Digimer
257a998743 * Updated Database->configure_pgsql() to use 'postgresql-setup --initdb --unit postgresql' instead of the deprecaded 'initdb' switch.
* Updated Database->insert_or_update_states() to switch to an active UUID if the passed in UUID is not an active handle.
* Updated Database->query() to swutch to 'sys::database::read_uuid' if the passed in 'uuid' is not an active handle.
* Updated Database->_test_access() to return immediately if the passed in uuid is not an active handle.
* Started working on a Storage->get_storage_group_from_path() bug where the storage group isn't being returned.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-28 12:07:36 -04:00
Digimer
32d47f70f1 * Fixed bugs around ScanCore->check_power() so that it now returns time on batteries and highest charge are returned properly.
* Created Network->is_our_interface() which returns '1' if an interface is one managed by an Anvil!. Also updated scan-network to use this to determine when an interface alert should be a warning or notice level alert.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-11 21:57:35 -04:00
Digimer
226d1de6b5 Updated anvil-update-states to use the permanent MAC addresses, as done in scan-network. Updated Network->get_ips() to do the same.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-11 02:57:45 -04:00
Digimer
3445d008d2 Removed a stray debug die.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:31:23 -04:00
Digimer
63c45430bb * Updated scan-network to clear duplicate IP addresses.
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:27:54 -04:00
Digimer
47c832bc0e * Updated Network->get_ips() to check for 'permaddr' when processing 'ip addr list' to ensure the partmanent MAC is used.
* Updated scan-filesystems to set swap usage alerts to notice level only.
* Updated scan-network to pull the permanent MAC address from an 'ethtool -P <iface>' call to deal with the fact that wireless interfaces don't have their real MAC in the sysfs address file.
* Updated anvil-provision-server to set the rtc_tickpolicy to catchup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-07 17:10:25 -04:00
Digimer
8e436ffec7 WIP: Started work on a new Storage->copy_device() method that will do 'dd' calls.
Fixed a bug in System->update_hosts() that was causing hosts to be constantly rewritten. (Well, I hope fixed, this has been a notoriously buggy part of the program...)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 17:58:20 -04:00
Digimer
0fc394b294 Updated ocf:akteeve:server to see in the target for a migration has a '<shortname>.mn1' host name, and if so, and if the target can be reached on that address, it will be used for the live migration. This is to allow for inexpensive 10 Gbps live migration speeds.
Removed the stub Server->provision method that was never used.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-25 10:01:03 -04:00
Digimer
ea368a942b Finished the '--update' switch function in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 23:51:41 -04:00
Digimer
5ee7b2ccaf Got the '--connect' and '--disconnect' functions working in anvil-manage-dr.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-21 15:14:46 -04:00
Digimer
a034583213 * Updated DRBD->gather_data() to record TCP/IP data between connections of two hosts.
* Updated anvil-manage-dr to use the TCP ports already configured for a resource when re-configuring a DR resource that has been previously configured.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 22:34:36 -04:00
Digimer
0fcde483be
Merge branch 'master' into anvil-tools-dev 2021-09-20 11:44:30 -04:00
Digimer
fbe9adc306 Resolve a words key conflict.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:42:47 -04:00
Digimer
5c07179aa6 * Resolved a words.xml conflict.
* Reworked where and how Database->configure_pgsql() is called, and boosted logging around it (trying to debug a build test issues).
* Updated Database->configure_pgsql() to only check if the Anvil! user and DB exists if another step of the config happened.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-20 11:32:45 -04:00
Digimer
e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-19 14:06:18 -04:00
Digimer
4e9882812d * Fixed a bug where the periodic database dumps on the primary database Striker were not sync'ing to peers. Also fixed a bug where these periodic dumps weren't running at all.
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 23:18:06 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Madison Kelly
922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time.
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2021-09-16 23:10:55 -07:00
Digimer
6664c5b77f * Fixed a bug where scan-drbd, with DR configured, was not recording TCP ports assigned to connections properly.
* More bugs fixed in anvil-manage-dr, tested repeatedly as a job and so far, so good. Other functionality still to come.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 23:34:25 -04:00
Digimer
da9dc03d04 Updated anvil-manage-dr to update the job progress and convert prints into strings.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-12 15:14:03 -04:00
Digimer
ffd15406e0 * anvil-manage-dr can now protect a server! Still lots to do though.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-11 19:42:55 -04:00
Digimer
20a784baa2 * Continuing work on anvil-manage-dr. Got it to the point where it should (but doesn't yet) create the new DRBD config and the LV(s) on DR.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-11 16:41:23 -04:00
Digimer
5b35204af4 * Updated DRBD->get_next_resource() to take the new 'dr_tcp_ports' ports which, if set, returns two free TCP ports.
* Got anvil-manage-dr to the point where it writes the updated resource configuration to enable DR support. (untexted)

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-09 23:07:03 -04:00
Digimer
9edf698c37 Updated Database->get_storage_group_data() to determine when a node or DR host needs to be removed from a Storage group, or when a member of an Anvil! needs to be added to a storage group.
Created Storage->get_vg_name() to assist with anvil-manage-dr, which is still a WIP.
Continued work on anvil-manage-dr (which exposed the issue that required the update to Database->get_storage_group_data().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-08 00:50:45 -04:00
Tsu-ba-me
f61527edf7 fix(tools): save screenshots to states table 2021-09-03 16:21:55 -04:00
Tsu-ba-me
c1859bc8d8 fix(tools): use netpbm tools instead of imagemagick 2021-09-03 16:21:55 -04:00
Tsu-ba-me
65613f501b fix(tools): add option to resize server screenshot 2021-09-03 16:21:55 -04:00
Tsu-ba-me
7467036054 build(tools): add anvil-get-server-screenshot script to build 2021-09-03 16:21:55 -04:00
Tsu-ba-me
da6b4d39c6 fix(tools): disable line wrap in image Base64 output 2021-09-03 16:21:55 -04:00
Tsu-ba-me
4ef231b567 fix(tools): prevent too frequent inserts of server VM screenshots 2021-09-03 16:21:55 -04:00
Tsu-ba-me
1014299d38 fix(tools): enable anvil-get-server-screenshot to be a job 2021-09-03 16:21:55 -04:00
Tsu-ba-me
f97a820b48 feat(tools): add script to take screenshot of server VM 2021-09-03 16:21:55 -04:00
Digimer
2f8b1fb72e Updated anvil-provision-server so that when the OS type is 'win7', set the disk to sata and the NIC to e1000e. Also updated it to store the virt-install call in the 'variables' table and write it out to /mnt/shared/provision.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-31 00:32:15 -04:00
Digimer
4427fe9f0d * Found the source of the vnet constantly cycling back to 'up' bug. The anvil-update-state tool was marking the vnet device operational state back to 'unknown' and scan-network was marking it back up.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-30 17:59:39 -04:00
Digimer
e40d0e2444 Fixed a bug where if a database is pingable but the pgsql database is down, and it's the first database tested (or local), then the DB handle used to read / quote fails.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-26 23:26:03 -04:00
Digimer
4c7bb45ab9 Fixed a race condition where configuring the IPMI BMC would appear to fail because the BMC wouldn't report the user list after a cold reset.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-25 21:02:00 -04:00
Digimer
6cbdc388d4 Fixed a bug where corosync's configuration of a backup ring was broken.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-24 15:52:44 -04:00
Digimer
04cb116c1b Updated anvil-parse-fence-agents to validate each fence agent's metadata is valid before adding it to the unified XML.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-19 00:58:26 -04:00
Digimer
8abb5b46e0 * Added support for setting per-agent log-level and log secure values in amvil.conf.
* Moved the check for an agent being disabled into ScanCore->agent_startup()

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-18 23:07:15 -04:00
Digimer
3674a47179 WIP - Working a tool to manually load updated server definition files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-16 11:39:16 -04:00
Digimer
aec22bb79c Added a check in scan-network that finds/removes duplicate network interface names.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-11 12:17:01 -04:00
Digimer
4800f7181f * Updated ScanCore to boot a node that is off without a stop reason.
* Fixed a bug where anvil-safe-stop was not recording the stop-reason. Also made '--poweroff' an alias for '--power-off'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-07 14:01:14 -04:00
Digimer
acaacd9a86 * Created Storage->get_size_of_block_device() that takes a block device path and returns the size of the path, if it's found in the database.
* More work on the storage management of anvil-manage-server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-06 22:46:02 -04:00
Digimer
7a504467ef
Merge branch 'master' into anvil-tools-dev 2021-08-06 14:51:44 -04:00
Digimer
606bd8f1f0 Continuing work on anvil-manage-server.
Created Storage->get_storage_group_from_path() that takes a block device path and tried to find the Storage Group it belongs to.
Updated Storage->get_storage_group_data() to make it possible to look up a storage group UUID using the SG's name.
Updated DRBD->gather_data() to take a pre-generated XML via the new 'xml' parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-08-06 14:21:11 -04:00
Tsu-ba-me
063840ecb6 fix(tools): correct message_* string keys in striker-manage-vnc-pipes 2021-08-04 13:53:44 -04:00
Tsu-ba-me
8da318c933 fix(tools): patch failure to fix 2nd pipe after server migration 2021-08-04 13:40:02 -04:00
Tsu-ba-me
0f1c3d2435 chore(tools): remove unused function from striker-manage-vnc-pipes 2021-08-04 13:40:02 -04:00
Tsu-ba-me
cdb66019d3 fix(tools): avoid port conflict 2021-08-04 13:40:02 -04:00
Tsu-ba-me
7e447000b4 fix(cgi-bin): use unspecified instead of loopback address in SSH tunnel 2021-08-04 13:40:02 -04:00
Tsu-ba-me
b3b6da8259 chore(cgi-bin): remove debug log level from manage_vnc_pipes and its support scripts 2021-08-04 13:40:02 -04:00
Tsu-ba-me
549758b2f2 build(tools): include support scripts for manager_vnc_pipes endpoint into makefile 2021-08-04 13:40:02 -04:00
Tsu-ba-me
e50bfc7308 fix(tools): correct typo in passing server_uuid to get_vnc_info() 2021-08-04 13:40:02 -04:00
Tsu-ba-me
3a8f4c339b fix(tools): use VNC port in variables table if available 2021-08-04 13:40:02 -04:00
Tsu-ba-me
e4436be17b fix(tools): do checks and kills as root 2021-08-04 13:40:02 -04:00
Tsu-ba-me
bb155a5786 fix(tools): update job progress in catch-all case 2021-08-04 13:40:00 -04:00
Tsu-ba-me
ffc1fb096a fix(tools): correct switch name typo in striker-manage-vnc-pipes 2021-08-04 13:38:28 -04:00
Tsu-ba-me
1fec288ad0 fix(tools): make striker-manage-vnc-pipes executable 2021-08-04 13:38:28 -04:00
Tsu-ba-me
7d9013a60b fix(tools): allow striker-manage-vnc-pipes to be executed as a job 2021-08-04 13:38:26 -04:00
Tsu-ba-me
0935b9a990 feat(tools): move manage_vnc_pipes endpoint core logic to separate script 2021-08-04 13:34:58 -04:00
Tsu-ba-me
5459e610aa fix(tools): auto-end tunnel script when connection breaks 2021-08-04 13:34:58 -04:00
Tsu-ba-me
d5724c1457 chore(tools): rename striker-start-ssh-tunnel->striker-open-ssh-tunnel 2021-08-04 13:34:58 -04:00
Tsu-ba-me
23d818cfff fix(cgi-bin): avoid direct SSH calls 2021-08-04 13:34:58 -04:00
Digimer
e3d65d654c * Continuing work on anvil-manage-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-29 17:15:02 -04:00
Digimer
3f1c2dd38f * Couple of small cleanups for fence_delay.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-26 15:47:26 -04:00
Digimer
8d2e454d69 * Updated fence_delay to set the ownership of the log file to 'hacluster:haclient'. This should address https://github.com/digimer/fence_delay/issues/1
* WIP - COntinuing work on anvil-manage-server, far from done yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-26 15:35:04 -04:00
Digimer
bc8b9274cb WIP; Reworked anvil-manage-server to have a more interactive menu system (for the sections done so far).
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-26 15:35:04 -04:00
Digimer
28865780f8 * Updated Database->get_server_definitions() to take a specific server UUID, allowing just the one definition to be loaded. Also had it clear previous loads.
* Updated Server->parse_definition() to call DRBD->get_devices() so that referenced LVs can be loaded properly.
* Continued WIP in anvil-manage-server

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-20 23:19:29 -04:00
Digimer
623dbb0863 WIP; Restarted work on anvil-manage-server.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-18 16:21:00 -04:00
Digimer
548c52701a Updates Jobs->update_progress() to take a 'variables' hash reference, and to support logging as well.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-16 15:07:07 -04:00
Digimer
1e159f548e Added a couple notes for later dev.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-16 11:13:48 -04:00
Digimer
39236e9b3f Switched default graphics for new servers to 'vnc' instead of spice.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-16 00:08:02 -04:00
Digimer
cebae28716 * WIP - Fixing a bug in scan-network where vnet devices aren't being recorded against their bridge.
* Updated scan-server to record the VNC port it is using in the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-15 00:42:47 -04:00
Digimer
7e7b91b286 * Updates anvil-join-anvil to update corosync.conf to use the BCN1 link as the main knet network with the SN1 link as the backup link.
* Fixed a bug in Cluster->parse_cib() where the local machine's ready state was being set to the node name.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-07-14 12:17:19 -04:00
Digimer
d7d418ee1b * Fixed a bug in DRBD->gather_data() where the peer node's data was being recorded where the local node's data should have been saved.
* Fixed a bug in anvil-delete-server where, if a server was off already, the server would not be removed from pacemaker.
* WIP - continuing on scan-network

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-30 14:58:36 -04:00
Digimer
a697011b08 * Disabled debug logging in anvil-daemon.
* WIP - working on new scan-network scan agent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-30 02:36:06 -04:00
Digimer
f7b8a053b0
Merge branch 'master' into scancore-debugging 2021-06-28 20:22:37 -04:00
Digimer
6777104398 * Fixed a bug in anvil-daemon where, when an anvil-manage-power reboot run had triggered a reboot, anvil-daemon didn't set the job_progress to '100', causing constant reboots. Also fixed a bug where the log level was hard-set to '1' instead of '2' needed during debugging.
* Updated Jobs->get_job_uuid() to accept the new 'incomplete' parameter that, when set, will look for jobs whose progress is > 1 and < 100.
* Updated ScanCore-agent_startup() to take the new 'no_db_ok' parameter which returns with '0' if no DB is available and that parameter is set to '1'.
* Fixed a logging bug in 'anvil-join-anvil'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-28 20:04:11 -04:00
Fabio M. Di Nitto
7aea5e1b11 Switch to kmod-drdb
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-06-27 07:36:36 +02:00
Digimer
04f7571097 * Fixed a typo causing anvil-manage-power to not compile.
* Updated anvil-configure-host to register a reboot job when needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-25 21:40:55 -04:00
Digimer
0c475d2a2e * Fixed a couple logging bugs.
* Updated scan-cluster to get the CIB from pcs instead of reading the CIB from disk.
* Updated anvil-daemon to always call striker-prep-database at log level 2 while trying to find the cause of rare postgres config failures. Also updated striker-prep-database to use the new method of initializing the DB.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 18:22:55 -04:00
Digimer
d3052c0229 * Finished Cluster->check_server_constraints() and added it to scan-cluster. This now makes sure servers don't roll back to their old host after it has been fenced and recovers.
* Completely disabled Network->check_network(), it's causing more problems than it solves.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 14:19:58 -04:00
Digimer
e7a06fce72 * Disabling the periodic network health check in anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-23 11:01:33 -04:00
Digimer
30f478267a * Forced anvil-daemon to log-level 2 and to enable secure logging to continue debugging setup issues.
* Fixed a undefined variable warning.
* Removed a debugging die from Database->resync_databases().

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 19:41:00 -04:00
Digimer
47fa126a3c * Fixed a typo that blocked anvil-daemon from starting.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 19:00:26 -04:00
Digimer
023f43eda9 * In the never-ending attempt to resolve the build consistency issues, this commit enables extra debugging logging and, hopefully, implements a fix in anvil-daemon where a job could be started repeatedly.
* Renamed the special job status 'scancore_startup' to 'anvil_startup', given it's handled by anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-22 16:12:12 -04:00
Digimer
5a343d6d75 * WIP; Started work on Cluster->check_server_constraints() that will track when a server's location constraint needs to be updated when the old preferred node is lost.
* Removed (for now) setting MTU in the ifcfg-X files during anvil-configure-host runs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-21 23:22:48 -04:00
Digimer
76689aa245 * I've decided that live reconfiguring of NetworkManager interfaces is too unreliable. This commit disables all attempts to reconfigure the network while it's up, and simply reboots on changes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-20 12:03:35 -04:00
Digimer
629c2b8e8c * Moved up when the reboot happens, when it's needed, avoiding a network reload when a reboot is going to happen anyway.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-19 14:56:28 -04:00
Digimer
bbee77d265 * Re-enabled reboot
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 23:18:53 -04:00
Digimer
08a958ec60 * Finished updating Network->check_network() to check/heal bridges.
* Updated anvil-configure-host to not reboot on network chane (will verify when this commit is function tested).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 22:42:10 -04:00
Digimer
6a8a192cfd * Added an explicit delete call when network changes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 21:06:24 -04:00
Digimer
bd24c1c5bb * I _might_ have fixed the network configuration issue in anvil-configure-host... Updated it so that if 'nmcli' doesn't report a valid device name, it looks for it in the ifcfg-X file, and uses 'X' if not found there.
* Added the 'print' parameter to Log->variables() to allow printing to STDOUT when set.
* Renamed Network->check_bonds() to Network->check_networks() in anticipation of adding bridge monitoring / repair to it later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-18 19:37:37 -04:00
Digimer
c7c6c8dee5 * Reworked the attempt to repair the network in anvil-daemon to not touch the network until the machine has been running for at least two minutes.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-15 12:04:27 -04:00
Digimer
11b1900e1b Note: Continuing to resolve the build issues with network startup. Expect breakage.
* Upped the aging of jobs and alerts data from 2 to 24 hours. Also added a check to prevent deleting a job of any age that is incomplete.
* Major update to anvil-configure-host to not touch the network unless something has actually changed. Not yet tested on a fresh system, will verify nothing broke in the CI tests this commit will trigger. Also changed it so that, if after reconfiguring the network it times out trying to reconnect to a database, it calls a reboot instead of simply exiting. Further, a reboot is now not called on exit unless something changed to require it.
* Updated Network->check_bonds() to return '1' if anything was done to heal a bond.
* Updated anvil-update-states to be more careful about clearing virsh bridges. Specifically, it checks to see if virsh is running and that the returned bridges aren't actually error codes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-14 01:58:25 -04:00
Digimer
a1b06e4355 * Continuing to try to get the network to reliably start during configuration...
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 14:59:39 -04:00
Digimer
1e7847d4dd * Added a call to Network->check_bonds() to be called while non-Striker machines wait to connect to a database.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 14:14:37 -04:00
Digimer
3f32a56d0c * Created Network->check_bonds() that checks to see if any bonds are down, or if any interfaces configured to be in a bond are not actually in it. It accepts a 'heal' parameter that, by default, will bring up a bond with no active links, but leaves degraded bonds alone. It call also take 'all' and will try to bring up any missing interfaces. This distinction exists so that if a link is flaky and someone takes it down manually until it can be repaired, it doesn't get turned back on.
* Updated anvil-daemon to call Network->check_bonds() with 'all' on startup, then woth 'down_only' once per minute to try to heal down'ed bonds.
* Updated anvil-watch-bonds to take a 'run-once' switch and exit after one report, if set.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-13 13:33:51 -04:00
Digimer
0dd92a08c5 * Small change to variable name to help make logs clearer.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-11 21:47:10 -04:00
Digimer
0b6a9e37fa * Added scan_lvm_pv_sector_size to the scan_lvm_pvs table in the scan-lvm. This will be used later for growing a requested disk size for the DRBD metadata.
* Added a 1 minute delay to anvil-configure-host before calling a reboot.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-11 19:57:30 -04:00
Digimer
80bdac8e34 * Updated the pacemaker server config to drop the stop timeout to 5 minutes and the migration timeout to 10 minutes. This will avoid blocking the entire cluster when a stop or migrate operation times out. Will update scan-server to clean these up when they happen.
* Updated Database->archive_table() and ->_find_behind_databases() to loop through connected databases, instead of configured databases.
* Updated Network->get_ips() to only record the real MAC addresses on network interfaces (not bonds or bridges) in the "network::${host}::interface::${in_iface}::mac_address" hash. This should help avoid reboot loops caused by anvil-configure-host thinking the network needs to be reconfigured when it doesn't actually need to be.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-11 03:17:07 -04:00
Digimer
19c41c9171 * Added more logging while chasing a function test bug.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-08 23:56:48 -04:00
Digimer
0f43961568 * This commit lowers the logging levels of some debug log entries. It's to help diagnose occassional function test failures with an unknown source.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-08 21:57:51 -04:00
Digimer
daca6c887b * This contains a fairly major change to how time stamps are handled. All INSERT and UPDATE calls now generate a new timestamp via Database->refresh_timestamp, instead of using 'sys::database::timestamp'. This was done in responce to finding a bug where tables in a database differed in both counts of public and private schemas (ip_addresses table, specifically) that failed to resync because the timestamps were re-used too often.
* WIP - Continuing work on the new anvil-manage-server tool.
* Updated Database->get_anvils() to load information on the files available on each Anvil! system.
* Updated Database->insert_or_update_network_interfaces() to no longer take the 'timestamp' parameter.
* Removed all logging from Database->refresh_timestamp() to speed it up, given how often it will be called now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-08 15:23:15 -04:00
Digimer
5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 01:54:28 -04:00
Digimer
96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged.
* Updated anvil-daemon to have a new function called "handle_special_cases" called during startup that does any weird bug mitigation required. For now, this is used to mitigate against rhbz#1961562, though certainly it will be used for other reasons later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 00:01:11 -04:00
Digimer
e15c1651ed * Fixed a bug with deleting bad keys where jobs to delete keys on non-dashboard machine wasn't being assigned to the proper target machine.
* Fixed a bug with anvil-manage-keys where a state_uuid entry recorded on one database may not be read from a machine reading from another database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-05 19:07:25 -04:00
Digimer
16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop.
* Updated Cluster->parse_cib() to not require a database connection (part of the work to make ocf:alteeve:server run without a DB)
* WIP: Continuing work on the ocf:alteeve:server RA to run without database connections.
* Updated the scancore daemon to explcitely check that all scan agent schemas are loaded in all databases on startup. This is to resolve resync issues on rebuilt strikers that may not yet have some schemas loaded when a DB resync runs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-05 14:32:26 -04:00
Digimer
24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time.
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-03 22:25:36 -04:00
Digimer
73267a8ea9 * WIP - Slowly working on anvil-manage-server
* Updated the scancore interval to 60 seconds.
* Updated Database->insert_or_update_health() so that 'delete' can find the health_uuid.
* Updated Convert->time() to return silently when passed '-1'.
* Fixed a bug scan-hardware to call Convert->round(). Also fixed it so it didn't set health scores of 0 for mismatch RAM when the RAM was not mismatched.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-02 14:08:55 -04:00
Digimer
4dcd505753 * Biggest change in this commit; scan-apc-pdu and scan-apc-ups now only run on Striker dashboards! This was because we found that if two machines ran their agents at the same time, the reponce time from SNMP read requests grew a lot. This meant it was likely a third, fourth and so on machne would also then have their scan agent runs while the existing runs were still trying to process, causing the SNMP reads to get slower still until timeouts popped.
* Bumped scancore's scan delay from 30 seconds to 60.
* Shorted the age-out time to 24 hours and again boosted the archive thresholds. As we get a feel for the amount of data collected on multi-Anvil! systems over time, we may continue to tune this.l
* Moved Database->archive_database() to be called daily by anvil-daemon, instead of during '->connect' calls.
* Added locking to Database->_age_out_data to avoid resyncs mid-purge. Also moved the power, temperature and ip_address columns into the same 'to_clean' hash as it was duplicate logic.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-31 13:34:49 -04:00
Digimer
8807915bb7 The theme of this commit is database cleanup and fixes.
* Updated Database->_age_out_data() to check for certain scan agent tables and, for those found, purge out old records. This should go a long way to keeping the database data responsive.
* Fixed a bug in Jobs->update_progress() where the 'job_picked_up_by' column was being set to '0' instead of '$$' when clearing the job.
* Fixed a bug in System->update_hosts() where '127.0.0.1' would be used in hosts for the actual host name.
* Updated the default trigger, count and division values in anvil.conf to 100,000, 50,000 and 75,000 respectively. In combination with the aging of data, this should go a long way to minimizing database sizes and overheads.
* Updated anvil-daemon to call $anvil->Database->_age_out_data(); in it's daily tasks.
* Updated various striker-X tools to specifically request a DB resync on Database->connect calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-30 15:16:25 -04:00
Digimer
6abe06f125 The theme of these commits is improving DB responsiveness.
* Created Database->_age_out_data() to delete records from the database that are old enough to no longer be useful. This is designed to significantly reduce the size of the database, allowing a better focus on performance.
* Changed Database->connect() to default to NOT check for resync, reworking the old 'no_resync' to 'check_for_resync', so that resync checks happen on demard, instead of by default.
* Updated get_tables_from_schema() to now allow 'schema_file' to be set to 'all', which then loads the schema files of all scan agents as well as the core anvil schema file. Fixed a bug where commented out tables were being counted.
* Re-enabled triggering resyncs on 'last_updated' differences.
* Fixed a bug in scan-ipmitool where the history_id column in history.scan_ipmitool_value was incorrect.
* Created a new tool called striker-show-db-counts that shows the number of records in all public and history schema tables for all databases.
* Updated anvil-update-states to detect when a libvirtd NAT'ed bridge exists and to delete it when found.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-29 23:34:22 -04:00
Digimer
49a700d68f * Fixed a bug in anvil-join-anvil where the desired DNS servers were not matching existing list of used DNS servers, even when they are the same already.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-27 17:36:12 -04:00
Digimer
bbad058b33 * Created a new tool, anvil-watch-bonds, which is a live monitor of bonds and interfaces designed to be run from the command line on a given host.
* Created Words->center_text that takes a string (or string key) and centers it to a given string length, padding white spaces on either side of the string as needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-26 20:24:05 -04:00