Commit Graph

183 Commits

Author SHA1 Message Date
digimer
7710d9d109 * Created the new anvil-manage-server-storage tool which will specifically handle managing a server's disks.
* Created DRBD->parse_resource() to pass a specific DRBD resource's XML data.
* Fixed a bug in Get->available_resources() so that if the threads is lower than CPU cores, the cores are used as the total available to VMs.
* Fixed bugs in Get->server_from_switch() where it just wasn't working properly.
* Updated scan_drbd to not reset a resource's size to 0-bytes when a resource goes offline.

Signed-off-by: digimer <mkelly@alteeve.ca>
2023-02-03 22:05:34 -05:00
digimer
a3988cc3e5 * Added System->configure_logind() to ensure that nodes are configured to ignore ACPI power button events so that IPMI-based fences work immediately.
* Added call to System->configure_logind() to anvil-join-anvil and anvil-version-changes.
* Updated fence_pacemaker to add '--reboot' to the 'stonith_admin' call to ensure DRBD-triggered fence requests reboot instead of just turning nodes off.
This commit address issue #279.

Signed-off-by: digimer <digimer@gravitar.alteeve.com>
2023-01-13 21:42:10 -05:00
Digimer
6d59399c73 * Updated the short OS list.
* Created Get->virsh_list_net() and Get->virsh_list_os() that call and parse osinfo-query directly to create lists of supported network interfaces and OS optimization options used when provisioning VMs. The later of which is used to replace the old language list of OSes, which was clunky and prone to missing valid options.
* Updated Get->available_resources() to remove the old anvil_dr1_host_uuid mechanism of finding and referencing DR resources.
* Started adding --network support to anvil-provision-server to allow users to specify a specific network bridge, MAC address and model to use for a new VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-24 10:08:06 -05:00
Digimer
9194eb3d09 * Updated System->check_if_configured() to record that a host is configured in /etc/anvil to make the system auto-mark as configured if the host is removed from the DB (or, more specifically, variables -> system::configured is lost).
* Updated Database->get_anvils() to record dr_links to reference DR hosts to Anvil! systems.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-15 19:28:00 -05:00
Digimer
f9ca6fb170 * This adds the new anvil-version-change tool which anvil-daemon will call on startup to handle checks for changes made over releases/updates.
* Added the new 'dr_link_note" column to the dr_links tables so that links can be marked as DELETED.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-12-13 17:36:43 -05:00
Digimer
622fb84652 * Renamed the 'notifications' table to 'alert-override', better reflecting what it does.
* Got anvil-manage-alerts managing alert overrides.
* Created, but for now commented out, the new 'audit' table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-11-17 00:34:52 -05:00
Digimer
a4ef93404c * Fixed a bug in DRBD->gather_data() to remove trailing commas for existing TCP ports.
* Added the missing 'clear-mapping' switch to Get->switches in anvil-daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-10-05 20:15:32 -04:00
Digimer
ef3ac86162 * Fixed a bug where setting the db_in_use flag without a valid $ENV{_}.
* Added a nice_exit call to tools/striker-access-database

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 15:45:10 -04:00
Digimer
21738ab0d4 Added a bit more logging to the Database->mark_active method.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 01:02:59 -04:00
Digimer
a81478f2bc * Updated 'db_in_use' state to add the caller's name to the state name. This is pulled out when logging stale locks that are being reaped, to help debug where stale locks are coming from.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:29:03 -04:00
Digimer
e7cf8ac789 * Got more work done on anvil-manage-files. It now picks up new files on nodes/dr hosts in an Anvil! and downloads them if needed.
* Updated anvil-daemon to call anvil-manage-files on a per-minute basis to handle files added outside of the WebUI.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-09 00:08:19 -04:00
Digimer
5fea8ff46a * Adds the anvil-boot-server man page.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 19:09:57 -04:00
Digimer
b3b185a43c * Added the alteeve-repo-setup man page and updated it to show that when called with '-h'.
* Updated scancore to use the new Get->switches() list parameter.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-08-02 14:31:46 -04:00
Digimer
d9910fc951 Finished the man page for anvil-daemon.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-29 17:43:53 -04:00
Digimer
be612ff878 * Updated Get->switches() to take 'list' and 'man' parameters. With list, the passed in switches can be checked to ensure they're valid. With 'man', if set to the name of a man page (usually $THIS_FILE) will be displayed if --help, -h or -? are used.
* Disabled striker-parse-oui until it can be reworked to store the the OUI data in a flat file instead of in the database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-29 16:56:40 -04:00
Digimer
cd220e97dc Disabled striker-prep-databas and set Database->configure_pgsql() calls to use debug => 2.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-20 20:32:18 -04:00
Digimer
7fd6185445 * Disabled firewalling for now. There appears to be an issue starting up with DRBD.
* Updated Convert->time() to return whatever was passed in instead of '#!error!#'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-09 19:46:38 -04:00
Digimer
bce9e2caaf This is the first attempt at enabling firewalld completely. There is a decent chance that problems exist, so it won't be a surprise if a few more commits are needed to this branch before things work.
* Added multiple new private methods to Network that help in managing the firewall.
* Updated Server->boot_server to manage the firewall after the server boots. Updated ->migrate_server to create a job, if a database connection exists, for the migration target to update it's firewall as soon after the server appears as possible.
* Updated ocf:server:alteeve to manage the firewall when called post-migration, in case there was no DB connection and the job above didn't run. Fixed a bug where the disk state wasn't being evaluated properly.
* Updated scan-server to check that the firewall is managed when a server state has changed.
* Updated anvil-daemon to run Network->manage_firewall on startup.
* Heavily reworked 'anvil-manage-server' to either just run 'Network->manage_firewall', or if passed '--server X', to wait for the server to appear for up to 1 minute, then to check that the firewall is managed (to capture servers being migrated to the host.)
* Removed firewall management from striker-prep-database.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-07-02 17:06:04 -04:00
Digimer
f2d06fa9b1 * Updated striker-parse-oui to only run if/when the system has been running for at least one hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-23 21:21:38 -04:00
Digimer
ab9b00a2f7 * Updated anvil-daemon, in its daily checks, to disable ksm and ksmtuned daemons.
* Updated scan-drbd to purge peer records that no longer have corresponding LVM data.
* Updated System->{en,dis}able-service to take the 'now' paramter which, when passed, causes the action to take immediate effect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-21 22:25:07 -04:00
Digimer
911f7cfb6a This is another big commit with a lot of DB work. Getting closer to sorting out the frequent resyncs.
* Changes Database->connect to always use the first DB connected to, not the local one if that applies. This treats the first DB (sorted by UUID) as "primary" and the second (or third...) as more of a backup.
* Moved db_in_use and lock_request to use the 'states' table instead of the variables table. These are set and removed so often that it was messing up things with resync's when the data is transient anyway. Fixed multiple bugs with both to better set and clear properly.
* Created Database->read_state() to assist with the above changes.
* Updated Database->refresh_timestamp() to specifically check that the returned time stamp differs from the previously used one, looping until they differ if needed.
* Disabled striker-manage-install-target when called to update the repos, as the Install Target function doesn't work at this point.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-06-16 20:10:43 -04:00
Digimer
e6dcff1cf1 * Added a missing modified_date to ip_addresses in Database->get_ip_addresses().
* Updated scan-network to purge old historical ip_addresses when clearing duplicates now.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-21 15:52:25 -04:00
Digimer
1b70b49cf8 * Updated Network->find_matches() to try to populate the first and second parameters if they're not passed in.
* Updated Network->load_ips() to load extra information about the interfaces.
* Updated ocf:alteeve:server to not check libvirtd daemon state on server start.
* Updated scan-hardware to check for duplicate entries and purge if found.
* Updated scan-network to check for the 'default' virbr0 interface by checking if the config file exists instead of calling virsh.
* Updated scan-server to have better logging.
* Created the new (and incomplete) anvil-test-alerts tool
* Updated scancore to support --purge to pass to all agents and then exit.
* Updated ScanCore->call_scan_agents() to no longer use 'timeout' as it was causing issues with virsh calls.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-05-20 10:28:21 -04:00
Digimer
142be7674e * Fixed a bug in striker-scan-network where the scan wasn't running properly when no network was specifically given.
* Updated DRBD->get_devices() to store information about the nodes for each resource.
* Got more work done on anvil-report-usage.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-04-08 18:54:55 -04:00
Digimer
0b41029db2 Reworked Database->_find_behind_databases to loop through tables, then databases when evaluating for resync. This is still racy but should be less racy as the time between counts of columns for a given table should be a lot shorter. Also re-enabled triggering resyncs based on the age of the most recent record.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-31 21:19:32 -04:00
Digimer
7212ea1c2f Fixed a bug where reaping db_in_use states wasn't restricted to the caller's host_uuid.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
74b7719cf5 * Created the new anvil-manage-host that can check/set if a host is configured. On Strikers, it can age out data, resync data, and check/set if the local database is active.
* Updated striker-prep-database to again enable the postgresql service.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-29 17:12:26 -04:00
Digimer
edf51adaec * Changed 'anvil-manage-power' to no longer set the job progress to 50 prior to calling a reboot. It now sets to 100 immediately. Also reduced the uptime timer to five minutes from ten.
* Updated striker-auto-initialize-all() to reconnect to DBs during waits to better detect when a DB is marked as offline.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-16 00:35:26 -04:00
Digimer
7b090e1623 * Updated Database->shutdown() to disconnect, stop the postgresql daemon, then reconnect.
* Updated anvil-daemon to not stop a database until both/all DB hosts are in both/all DB's hosts table.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-15 22:33:42 -04:00
Digimer
3fd0db15bf * This rather heavily reworks how database shutdowns works. It adds much more intelligent shutdown, tracking who is using the database, being able to mark a database as "offline" and waiting for users of the database to disconnect before it shuts down.
* Also removed the variables for the database name and DB user name, setting them statically now.
* Created Database->shutdown() to more kindly stop a local database server.
* Added 'check_db_in_use_states()' to anvil-daemon to clean any stale entries marking a database as in use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-03-14 16:41:37 -04:00
Digimer
b234b79544 Updated anvil-daemon to check if anvil-sync-shared is running if the reported RAM use is too high. If so, it doesn't exit. This fixes an issue where anvil-sync-shared would loop forever as it would constantly be killed when downloading large files.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-27 21:29:30 -05:00
Digimer
68b1d12545 Updated anvil-daemon to not shutdown a striker DB until the striker host has been running for at least an hour.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-24 22:25:37 -05:00
Digimer
f77f486775 Fixed a typo in scan-network
Fixed a missing 'next' to prevent the first DB from disconnecting when down'ing excess DBs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-09 15:52:21 -05:00
Digimer
d70b9a4956 Updated scancore and anvil-daemon to check their RAM use at the end of each loop and, if it's using more than 1 GiB of RAM, it sends an alert and exits.
* Updated Database->resync_databases() to never run on non-striker machines. On Strikers, before a resync, _age_out_data() is called to clear old data in long-off databases.
* Created System->check_memory() that is loosely based on anvil-check-memory, but checks to see if it's being controlled by a systemctl started daemon and, if so, reads the RAM in use from it's status output.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-02-05 22:08:06 -05:00
Digimer
a633ab7f63 Added a periodic check to ensure all users can ping. This fixes a bug where a local striker dashboard whose DB was stopped wouldn't work.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-21 14:27:58 -05:00
Digimer
e37f487704 Fixed a bug in System->check_ssh_keys where the 'admin' user's RSA keys were owned by root.
Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-20 14:13:27 -05:00
Digimer
892a475881 * Fixed a bug in Convert->format_mmddyy_to_yymmdd() where being passed '--' didn't return the same.
* Fixed a divide-by-zero bug in anvil-boot-server when no servers exist yet.
* Fixed a bug in anvil-daemon where the local databsae engine was being started when it shouldn't.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-18 02:38:50 -05:00
Digimer
652f87ec74 * Updated scan-network to also clean up the media type.
* Updated anvil-daemon to check for files in /mnt/shared/incoming on striker dashboards and add them to the media library if needed.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-12 23:27:44 -05:00
Digimer
72038e8358 * Fixed a bug where ethtool's Media type contained tab characters that broke JSON when configuring the netowrk interfaces.
* Updated the copywrite date to 2022.
* Updated the database resync to not run on machines host VMs to help reduce the chance of oom-killer terminating a VM.

Signed-off-by: Digimer <digimer@alteeve.ca>
2022-01-03 17:06:56 -05:00
Digimer
3346d31194 * Created Get->kernel_release() that returns the current kernel release (version) in use on the host or on a remote system.
* Created DRBD->_initialize_drbd() to makes sure the DRBD kernel module can load and tries to build the module, if necessary. This is meant to provide support for clients that can't access needed internet resource (or the internet at all).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-12-07 20:03:39 -05:00
Digimer
65dfc22a38 Added an eval{} call around Database->query()'s ->prepare() DBI call to better handle lost database handle.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-30 17:58:29 -05:00
Digimer
034c38fdeb Disabled calling striker-prep-database from the spec file, and enabled scancore.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-24 00:55:02 -05:00
Digimer
8e41814ca2 * Updated anvil-daemon->prep_database() to start the postgresql daemon if it's not running and no databases are available.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 21:49:24 -05:00
Digimer
b517117bc1 * Did more work on trying to figure out why iniital setup of the database was failing. I believe it was because, in anvil-daemon, after calling 'prep_database' we called ->connect() _without_ 'check_if_configured' set. Next round of function testing should help confirm is this was the case.
* Added 'configure_firewall()' to 'striker-prep-database' to explicitely open the postgresql service for all active zones.
* Did some general logging changes and cleanup around the same.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-11-23 20:41:29 -05:00
Digimer
3445d008d2 Removed a stray debug die.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:31:23 -04:00
Digimer
63c45430bb * Updated scan-network to clear duplicate IP addresses.
* Fixed a bug in anvil-daemon where striker-prep-database was always being called, when it shouldn't in some cases.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-10-10 22:27:54 -04:00
Digimer
e60a1b46b3 Fixed bugs related to automatic database startup and conditional backup loading.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-19 14:06:18 -04:00
Digimer
4e9882812d * Fixed a bug where the periodic database dumps on the primary database Striker were not sync'ing to peers. Also fixed a bug where these periodic dumps weren't running at all.
* Updated anvil-daemon->prep_database() to only run if the database dump file doesn't exist. (If it does, it's clearly configured).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 23:18:06 -04:00
Digimer
72b17ff1f9 * Reworked how databases are stopped, now being handled in anvil-daemon. This way, initial starts will still do traditional resyncs, then shut down. This should allow the best of both worlds, where data is not lost on striker start/stop loss/recovery, but operate normally otherwise without delays.
* Updated Database->archive_database() to return the full path to the dump file.
* Disabled enabling the postgresql daemon.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-09-18 22:33:31 -04:00
Madison Kelly
922899ea78 * WIP: Working on a new method of failing over between which Striker is the active database, instead of running N-number of databases all the time.
* Created Database->backup_database() that creates a pg_dump of the active database.
* Created Database->load_database() that loads the database from a flat file, optionally creating a backup before doing so, and using iptables to block access during the process.
* Updated Database->configure_pgsql() to not start the postgresql daemon unless it just initialized the DB.
* Much work, not yet complete, to Database->connect() to stop after the first successful connection. Added logic that, if not connection was established and the host is a Striker, to load a peer's backup, if it exists, and then start the local daemon.
* Updated anvil-daemon to now have a section to run tasks on a ten minute cycle, which will later be used for the primary Striker to dump / copy its database to peer(s).

Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2021-09-16 23:10:55 -07:00