Commit Graph

2588 Commits

Author SHA1 Message Date
Digimer
41cd1e0319 * Several bugs fixed and enhancements;
* DRBD is now configured to a ping-timeout of 3 seconds.
* Created Log->switches() that returnes the command line switches used by Anvil! tool command line calls based on the active log levels / secure logging. Appended this to all invocations of our tools.
* Updated Database->resync_databases() to now only skip 'jobs' and 'variables' tables with less than 10 record differences. All other differences will trigger a resync.
* Created System->_check_anvil_conf() that, as you might guess, checks in anvil.conf exists and created it (using defaults), if not. It also checks to see if the 'admin' group and user exists and creates them, if not.
* Updated anvil-daemon to check anvil.conf on start up and in each loop. Created the function check_journald() that checks (and sets, if needed) that journald logging is persistent.
* Made striker-manage-peers to check_if_configured on the Database->connect() when updating anvil.conf and the target UUID is the local machine. Also created a loop to make the reconnection a lot more robust.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-24 00:09:32 -04:00
digimer-bot
a90782bfdd
Merge pull request #92 from ClusterLabs/anvil-tools-dev
* Fixed (another) bug in Database->_archive_table() that was preventi…
2021-05-22 13:33:06 -04:00
Digimer
2f8becbb11 * Fixed (another) bug in Database->_archive_table() that was preventing Database.pm from compiling.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 13:30:16 -04:00
digimer-bot
7cea567817
Merge pull request #91 from ClusterLabs/anvil-tools-dev
* Fixed a missing semi-colon that broke Database.pm.
2021-05-22 13:00:48 -04:00
Digimer
49890762b9 * Fixed a missing semi-colon that broke Database.pm.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 12:53:49 -04:00
digimer-bot
67c73cdef5
Merge pull request #90 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-05-22 12:34:16 -04:00
Digimer
995edd321f
Merge branch 'master' into anvil-tools-dev 2021-05-22 12:29:58 -04:00
Digimer
a846f9ecbc * Fix to the database resync logic. The previous change to only resync if 10+ lines differed broke striker-manage-peers as the difference in host counts is what triggered the pairing of strikers.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 12:25:29 -04:00
Digimer
41d528418d * Increased the trigger point for database archiving. The current values were too low and cuasing frequent archive -> resync cycles.
* Fixed a bug in that archiving defaulting to not store on disk was not working properly. Now acts as described in anvil.conf.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-22 11:57:29 -04:00
digimer-bot
ce8fd02228
Merge pull request #89 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-05-20 00:30:44 -04:00
Digimer
48956d94fb * Fixed anvil-manage-system file name in Makefile.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:27:34 -04:00
Digimer
fc0954d0c8 * Started work on, but not at all finished, anvil-manage-server which will allow manipulation of a server's resources.
* Changed the alteeve repo RPM to the new cimmunity/enterprise repo
* Fixed a bug where 'fence_data::updated' was causing the fences web page to break.
* Fixed a bug in Database->insert_or_update_network_interfaces() where certain interfaces were being repeatedly added to the database.
* Fixed a bug in Database->_find_behind_databases() was marking DBs as behind even though they had less than 10 columns off.
* Fixed a bug in Get->host_name() where, if the host name was changed on disk but the environment variable was still the old name, it would cause the hostname to waffle back and forth and cause constant updated to /etc/hosts.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-20 00:16:09 -04:00
Digimer
ad4a1ecc78 * Increaded the scancore agent run timeout to 60 seconds.
* Updated anvil-safe-start to start DRBD resources when the peer's DRBD resourcs is 'Connecting',
* Updated fence_pacemaker to more intelligently check the list of host names related to an IP address when looking for the peer host name

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-15 00:12:43 -04:00
digimer-bot
c2b57ca3c8
Merge pull request #87 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-05-13 23:46:37 -04:00
Digimer
473c728117 * Updated scan-ipmitool to use 'jump' thresholds for a common sensor name where duplicates with the hex address appended may exist.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-13 23:40:50 -04:00
Digimer
44864ce321 * Updated Database->resync_databases() to set a default schema of 'public'. Also fixed a bug where, when the difference in record numbers between two line was > 999, it would not trigger a resync.
* Updated the scan agent timeout to 60 seconds. Also made the scan agent exit code log entries more helpful.
* Updated System->collect_ipmi_data() to now better handle duplicate sensor names. Now, instead of simply appending an integer, we find the hex address and use that in the sensor name when duplicates exist. This solves the problem of the sensor names not being consistently shown in order.
* Fixed message bugs (bad variable insertions) in scan-apc-pdu and scan-apc-ups.
* Fixed schema procedure bugs in the 'temperature' and 'ip_address' tables where the columns were in bad order, causing constanty updates.

Incomplete work;
* Create the shell of 'anvil-manage-storage', but virtually no logic exists in it yet.
* Started work on anvil-safe-start to deal with an issue where DRBD resources don't start when a server is running on a peer.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-13 23:27:38 -04:00
digimer-bot
4382dc8370
Merge pull request #86 from ClusterLabs/scancore-debugging
* Renamed tools/striker-purge-host to tools/striker-purge-target and …
2021-05-11 14:28:34 -04:00
Digimer
309aa13684 * Updated the name of striker-purge-target in the makefile.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-11 14:25:27 -04:00
Digimer
7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts.
* Fixed a bug in Database->get_tables_from_schema where history.X and X tables were being stored in the table list.
* Updated ocf:alteeve:server to no do resyncs on DB connect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-11 14:14:00 -04:00
digimer-bot
95159d4492
Merge pull request #85 from ClusterLabs/scancore-debugging
* Fixed a bug where the log message for a changed CIB wasn't useful.
2021-05-08 13:47:05 -04:00
Digimer
f833c311ba * To address issues with scancore debugging, we needed a tool to purge old anvils and hosts from the database. The 'test.pl' in this commit contains the new logic that will be merged into tools/striker-purge-host shortly.
* Created Database->find_host_uuid_columns() and ->_find_column() to create a list of tables and column names in the proper order to allow deletion of foreign keys to that deeply nested primary keys can be deleted. Specifically, this was meant for hosts -> host_uuid and anvils -> anvil_uuid, though it should work for other tables.
* Updated html/jquery-ui-1.12.1/package.json to address CVE-2020-7729
* Fixed a bug in the temperature table's history procedure where temperature_weight wasn't being copied.
* Updated anvil-provision-server to support '--anvil' that can take either the anvil-uuid or anvil-name.
* Updated anvil-safe-stop to default the stop-reason to 'user'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-08 02:02:46 -04:00
Digimer
3fb81c1a0a * Updated Convert->time() to silently return if the given time was '--'.
* Added a new parameter to Database->connect() called 'no_resync' that, if set, prevents a resync check being performed. Updated ->resync_databases() to find a uuid_column where the table name ends in 'ies' and the UUID column is 'y_uuid'. Updated ->resync_databases() to not fire on updated table age anymore, and to trigger only if the number of rows differ in a given table by more than 10.
* Updated Log->entry() to prefix a tool's name, when the new 'log::scan_agent' value is set. Also set this value in ScanCore->agent_startup(), to help differentiate log entries.
* Fixed a bug in scancore's main loop where it logged the sleep message at the start of the run.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-04 12:33:31 -04:00
Digimer
a74be60469 * Fixed a bug where the log message for a changed CIB wasn't useful.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-02 23:35:37 -04:00
digimer-bot
17112c419d
Merge pull request #84 from ClusterLabs/webui_anvil_page
* This commit started with work on webui endpoint set_power, but then…
2021-05-02 21:48:39 -04:00
Digimer
4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches.
* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-02 21:40:48 -04:00
digimer-bot
46c9035e4b
Merge pull request #83 from ClusterLabs/webui_anvil_page
* Created tools/striker-boot-machine to, well, boot machines. It uses…
2021-05-01 20:16:05 -04:00
Digimer
416f51323a * Created tools/striker-boot-machine to, well, boot machines. It uses host_ipmi or, failing that, other fence methods when available to boot a node.
* Created Cluster->get_fence_methods() that parses all fence methods out of a recorded CIB and stores the in a hash for a given host_uuid.
* Fixed a bug in ScanCore->post_scan_analysis_striker() where the short_host_name was not being stored correctly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-01 19:49:27 -04:00
digimer-bot
159c4a1612
Merge pull request #82 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-04-30 23:09:18 -04:00
Digimer
35e926c52b
Merge branch 'master' into anvil-tools-dev 2021-04-30 23:02:49 -04:00
Digimer
ca7052dd53 The core logic is done!!!! Still need to finish end-points for the WebUI to hook into, but the core of M3 is complete! Many, many bugs are expected, of course. :)
* Created DRBD->check_if_syncsource() and ->check_if_synctarget() that return '1' if the target host is currently SyncSource or SyncTarget for any resource, respectively.
* Updated DRBD->update_global_common() to return the unified-format diff if any changes were made to global-common.conf.
* Created ScanCore->check_health() that returns the health score for a host. Created ->count_servers() that returns the number of servers on a host, how much RAM is used by those servers and, if available, the estimated migration time of the servers. Updated ->check_temperature() to set/clear/return the time that a host has been in a warning or critical temperature state.
* Finished ScanCore->post_scan_analysis_node()!!! It certainly has bugs, and much testing is needed, but the logic is all in place! Oh what a slog that was... It should be far more intelligent than M2 though, once flushed out and tested.
* Created Server->active_migrations() that returns '1' if any servers are in a migration on an Anvil! system. Updated ->migrate_virsh() to record how long a migration took in the "server::migration_duration" variable, which is averaged by ScanCore->count_servers() to estimate migration times.
* Updated scan-drbd to check/update the global-common.conf file's config at the end of a scan.
* Updated ScanCore itself to not scan when in maintenance mode. Also updated it to call 'anvil-safe-start' when ScanCore starts, so long as it is within ten minutes of the host booting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-30 22:58:01 -04:00
Fabio M. Di Nitto
cc41424bb8
Merge pull request #81 from ClusterLabs/drbd
Update to kmod-drbd91
2021-04-28 06:20:24 +02:00
Fabio M. Di Nitto
2214866156 Update to kmod-drbd91
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
2021-04-28 06:17:15 +02:00
Digimer
15dab8aab7 * Started working on the node post-scan login in ScanCore. Created ScanCore->check_temperature() to get a thermal score against a node.
* Update ScanCore->check_power() to not require the parameter values.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-26 21:42:57 -04:00
digimer-bot
51bf505d5f
Merge pull request #80 from ClusterLabs/anvil-tools-dev
* anvil-safe-stop is complete! Testing still needed, of course.
2021-04-23 18:12:56 -04:00
Digimer
f202187c34 * anvil-safe-stop is complete! Testing still needed, of course.
* Updated DRBD->manage_resource() to call 'drbdadm adjust <res>' when starting a resource to help deal with a periodic issue where the 'allow-two-primary' option on the peer doesn't match the local setting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 11:56:11 -04:00
digimer-bot
edf749ae78
Merge pull request #79 from ClusterLabs/anvil-tools-dev
* Made good progress on anvil-safe-stop. It will now stop or migrate …
2021-04-23 08:06:19 -04:00
Digimer
82929e28bf
Merge branch 'master' into anvil-tools-dev 2021-04-23 00:06:34 -04:00
Digimer
3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed).
* Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 00:04:20 -04:00
digimer-bot
4304fbe6dd
Merge pull request #78 from ClusterLabs/anvil-tools-dev
* Finished anvil-rename-server!
2021-04-22 13:34:31 -04:00
Digimer
27259d1d53 * Finished anvil-rename-server!
* Created Storage->delete_file() that, well, deletes files (locally or on a peer).

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-22 13:29:50 -04:00
digimer-bot
2f17a0d402
Merge pull request #77 from ClusterLabs/anvil-tools-dev
* Updated DRBD->gather_data() to store data on peers so that the peer…
2021-04-20 23:25:53 -04:00
Digimer
53cd0bdf3a * Now with 100% less typos.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 23:22:45 -04:00
Digimer
e3ba64cb83 * Fixed a type in the Makefile.am.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 23:00:03 -04:00
Digimer
2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name.
* Added the scan-hpacucli scan agent. It's been done for a while and should have been added ages ago.
* Updated anvil-rename-server to get to the point where it will take down the DRBD resources on all machines, but waits if there is a sync under way. It also verifies that the server is off on all systems from virsh's perspective.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 22:46:51 -04:00
digimer-bot
591b550085
Merge pull request #76 from ClusterLabs/anvil-tools-dev
* Finished anvil-migrate-server and anvil-safe-start! Lots of testing…
2021-04-19 00:36:40 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
digimer-bot
a6a11abe01
Merge pull request #75 from ClusterLabs/anvil-tools-dev
Anvil tools dev
2021-04-18 20:02:50 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
a480357049 * Fixed a bug in Cluster->assemble_storage_groups() where, if a group is created during an anvil-provision-server run, the group would get created multiple times.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-15 18:51:10 -04:00
digimer-bot
83b140511e
Merge pull request #74 from ClusterLabs/anvil-safe-start-work
Anvil safe start work
2021-04-15 02:43:51 -04:00