Commit Graph

76 Commits

Author SHA1 Message Date
Digimer
607c097fc8 * Fixed a bug where, once a DRBD resource was allowed to be dual-primary for migration, that wasn't properly disabled post-migration.
* Updated DRBD->allow_two_primaries() to take the 'set_to' parameter which can be 'yes' to all and 'no' to disallow dual-primary.
* Updated ocf:alteeve:server to call allow_two_primaries() with 'set_to' = 'no' instead of calling 'adjust' after a migration completes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-25 00:45:52 -04:00
Digimer
5b4bfa747c * Reworked the anvil-join-anvil job parsing to help diagnose occassional faults. Also changed a fatal parse error to one that allows the run to be retried.
Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 01:54:28 -04:00
Digimer
96fffb0b96 * Finished updating ocf:alteeve:server to no longer require a database connection. To do this, and still be able to track live migration times, the Server->migrate_virsh() method now writes out the server name and migration time to a /tmp/anvil/migration-duration.<server_name>.<unix_time> file. This file is checked for by the scan-server resource agent and, when found, is parsed and the migration duration is recorded, then the file is purged.
* Updated anvil-daemon to have a new function called "handle_special_cases" called during startup that does any weird bug mitigation required. For now, this is used to mitigate against rhbz#1961562, though certainly it will be used for other reasons later.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-06 00:01:11 -04:00
Digimer
16c20ae69c * Updated Tools->catch_sig() to use return code 0 instead of 255 so that systemd doesn't think our daemons failed on stop.
* Updated Cluster->parse_cib() to not require a database connection (part of the work to make ocf:alteeve:server run without a DB)
* WIP: Continuing work on the ocf:alteeve:server RA to run without database connections.
* Updated the scancore daemon to explcitely check that all scan agent schemas are loaded in all databases on startup. This is to resolve resync issues on rebuilt strikers that may not yet have some schemas loaded when a DB resync runs.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-05 14:32:26 -04:00
Digimer
24ec17f8f7 * Added a new parameter called 'sensitive' to Database->connect() that returns after connections before any ancilliary checks are done, minimizing connect time.
* Fixed a problem with Database->insert_or_update_variables() where variable_source_uuid being set to an empty string wasn't converted to NULL.
* Fixed Database->locking() where the way the lock variable was set was rather broken.
* Created Striker->check_httpd_conf() which configured apache to handle the integration of the new WebUI for Anvil! management with the existing WebUI.
* Updated System->update_hosts() to specifically set the 127.0.0.1 and ::1 lines to handle how cloud-init overrides /etc/hosts and breaks CI/CD tests.
* Removed the old index.html as it's now used for the new WebUI.
* Began work on removing DB connection requirements from ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-06-03 22:25:36 -04:00
Digimer
7abbc938af * Renamed tools/striker-purge-host to tools/striker-purge-target and moved the code from test.pl over to it. No longer provides interactive selection, but now does work with Anvil! systems as well as hosts.
* Fixed a bug in Database->get_tables_from_schema where history.X and X tables were being stored in the table list.
* Updated ocf:alteeve:server to no do resyncs on DB connect.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-11 14:14:00 -04:00
Digimer
4a87ee71db * This commit started with work on webui endpoint set_power, but then switched to scancore debugging and I neglected to switch branches.
* Created Cluster->check_stonith_config() that checks and, if needed, reconfigures a cluster's fencing (stonith) config.
* Updated scan-cluster to call Cluster->check_stonith_config() at the end of each call.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-05-02 21:40:48 -04:00
Digimer
3a6902d899 * Made good progress on anvil-safe-stop. It will now stop or migrate servers (testing needed).
* Updated Server->shutdown_virsh() to change the parameter 'wait' to 'wait_time' to clarify it's use.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-23 00:04:20 -04:00
Digimer
2e37691116 * Updated DRBD->gather_data() to store data on peers so that the peer's LV path and backing disk is recorded. Also fixed a bug in ->get_status() where the return code for local calls was stored as a host name.
* Added the scan-hpacucli scan agent. It's been done for a while and should have been added ages ago.
* Updated anvil-rename-server to get to the point where it will take down the DRBD resources on all machines, but waits if there is a sync under way. It also verifies that the server is off on all systems from virsh's perspective.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-20 22:46:51 -04:00
Digimer
711a04999e * Finished anvil-migrate-server and anvil-safe-start! Lots of testing still needed for both though, and 'anvil-safe-start' does run as a job yet, but the logic is all there.
* Fixed a bug in Cluster->migrate_server() where waiting for the server to migate would never exit.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-19 00:32:13 -04:00
Digimer
eec14cb013 * Finished tools/anvil-boot-server and tools/anvil-shutdown-server.
* Fixed a bug where, in rare cases, $anvil->hostname() would call 'hostnamectl' and get a dbus error during shutdown, which would then cause the hostname to be changed to the error in the database.
* Fixed a bug in Cluster->boot_server() where it would never verify that a server has started successfully.
* Updated Database->get_ip_addresses() to store the IPs we manage in 'ip_addresses::<ip_address_address>::X'.
* Updated ocf:alteeve:server to work from command line calls, though more testing is still needed.
* Started work on 'anvil-rename-server', but haven't gotten far with it yet.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-04-18 19:54:58 -04:00
Digimer
413a4f73c2 * Updated Tools->_anvil_version() and Get->anvil_version() to now pick up a SchemaVersion from anvil.sql. This will change only when the schema changes and is used when Database->connect() is checking compatibility with other anvil database hosts. This will make it only break connection when there is a reason to do so. The anvil_version still remains as an informational version that will help when supporting users later.
* Updated Cluster->add_server() to now set failure timeouts to actual numbers instead of INFINITY after discovering that INFINITY doesn't work in those cases.
* Updated Databsae->get_hosts to now check if other entries have the same host name, and if so, to set their host_key to 'DELETED'. This should make it easier to handle when a hardware machine is replaced by new hardware but uses the same host_name.
* Updated Email->check_queue() to start and enable postfix.service if it's found to not be running.
* Updated Get->available_resources() to return '!!no_data!!' when a given host hasn't got any data in scan_lvm_vgs. Now use this in anvil-provision-server to exit if a node or dr host hasn't run scancore yet.
* Fixed a bug in scan-lvm where the pvs_uuid wasn't being loaded properly, preventing lost PVs, VGs and LVs from being flagged as deleted.
* Started work on anvil-migate-server, though it's far from complete.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-30 14:03:13 -05:00
Digimer
549dbad635 * Created Cluster->delete_server(), which deletes a server resource from pacemaker (stopping it first, if needed).
* Fixed a bug in Cluster->parse_cib() when a server that is off wasn't setting 'status'.
* Renamed 'server::location::<server>::host' to '...::host_name' in several places.
* Got more work done on anvil-delete-server, up to the point where it calls the new Cluster->delete_server() method.
* Updated fence_pacemaker to call 'drbdadm adjust all' to dampen an issue where in-memory fence configs seem to change, preventing reconnection of the peer after it reboots from the fence. More testing needed on this issue.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-25 01:00:55 -05:00
Digimer
05b1fccdb3 * Created Cluster->add_server() which, well, adds a server to a pacemaker cluster, including sorting out location constraints to favour the node the server is running on, if it's running.
* Removed the exit-if-no-DB check in ocf:alteeve:server so that (hopefully, needs testing), running servers won't be impacted if the nodes lost contact with both/all strikers.
* Updated scan-server to make an explicit check for missing XML definition files on startup and write them if needed.
* Very beginning work on anvil-delete-server has been started.
* Updated anvil-provision-server to wait when it's running in peer mode until the new XML definition is in the DB and then write it out to disk before exiting. Also updated it to add the new server to pacemaker before exiting.

Signed-off-by: Digimer <digimer@alteeve.ca>
2021-01-18 00:38:06 -05:00
Digimer
1d03a386d3 * Created Database->get_bridges() that, surprise, loads data from the 'bridges' table.
* Started work on Get->available_resources() that will take an 'anvil_uuid' and figure out what resources are still available for use by new servers or that can be added to existing servers.
* Fixed a bug in ScanCore->agent_startup() where tables weren't being generated properly from the agent's SQL file.
* Made Storage->change_mode() return silently if it's called without a mode being passed. This happens frequently and is harmless so it's not worth filling the logs with errors.
* Renamed the 'start_time' key to 'at_start' when recording files' MD5 sums in Storage->record_md5sums and ->check_md5sums.
* When we moved the directory scan logic out of the 'scancore' daemon and into 'Storage->scan_directory', the logic to record scan agent names in 'scancore::agent::<file>' was removed. This broke a few things and, so, it was restored when it was found that a file starts with 'scan-' and the directory matches the scancore agent directory.
* Moved the 'scancore' daemon's 'load_agent_strings' to 'Words'
* Updated Words->parse_banged_string() to look for variables in the format 'value=X:units=Y' and translate it properly.
* Fixed a bug in scan-ipmitool where discovered sensor INSERT SQL queries were queued, but not committed.
* Fixed a bug in scan-storcli where a while loop was broken, preventing execution.
* Fixed a bug in the 'scancore' daemon where it wouldn't exit if sums changed. Fixed a bug where alerts weren't being sent between loops. Fixed a bug where command-line log level wasn't surviving inside the main loop.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-23 01:10:23 -05:00
Digimer
d677d19ca0 * Moved Database->check_condition_age to Alert.
* Created (but not finished) scan-apc-pdu
* Added support to tracking maintenance-mode for nodes in Cluster->parse_cib
* Created Remote->read_snmp_oid().
* Created Server->get_definition.
* Updated Server->get_status() to write-out server XML files on-demand.
* Finished scan-cluster.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-23 01:28:21 -04:00
Digimer
33101f969a * Fixed several bugs related to tracking server boots, migrations and shut downs in the anvil database. The 'ocf:alteeve:server' now has (mostly?) safe integration with the Anvil! database. This was mostly done by updating Servers->boot_virsh(), ->shutdown_virsh() and ->migrate_server().
* Updates servers -> server_host_uuid to drop the 'NOT NULL' constraint.
* Created the new Get->server_uuid_from_name() that does what it says on the tin. Fixed a bug in ->host_uuid_from_name() where the host name was being returned instead of the UUID.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-07 02:40:31 -04:00
Digimer
be88be6d30 * Did a bunch of testing / bugfixes for scan-server.
* Updated the servers table to remove the 'not null' constraints on the server_start_after_server_uuid, server_pre_migration_file_uuid and server_post_migration_file_uuid columns.
* Updated ScanCore->agent_startup() to connect to the database(s) when there isn't a table list.
* Updated Server->migrate_virsh() to test for DB access before making DB calls (to allow ocf:alteeve:server to function even if all ScanCore DBs are offline).
* Updated ocf:alteeve:server to connect to the databases (though work without it), and changed '$FILE_NAME' to be 'ocf:alteeve:server' (to make logging more legible)
* Created the skeleton for 'scan-storage'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-05 23:47:56 -04:00
Digimer
46f1a05789 * Got the code in scan-server to the point where it _should_ now gracefully and automatically detect changes to a server's definition originatin from the database (via Striker), directly editing the on-disk definition file, or editing via libvirt tools (like virt-manager). Still needs to be tested though.
* Updated Server->migrate_virsh() to set 'servers' -> 'server_state' to 'migrating' and clear it again once the migation completes. Also added support for cold (frozen) versus live migrations.
* Updated Cluster->parse_cib() to check if a server with the server_state set to 'migrating' isn't actually migrating anymore and, if not, to clear that state. This is needed as scan-server will blindly ignore/skip any migrating server, and if a migration call is interrupted, the state could get stuck.
* Updated the 'servers' database table (and associated Database methods) to add columns for;
** server_ram_in_use      - tracking RAM used by a running server
** server_configured_ram  - RAM allocated to a running server (used with the above to alert a user and track _currently_ available RAM)
** server_updated_by_user - To be set by Striker tools to indicate when the user made a change that needs to push out to nodes / running server.
** server_boot_time       - Tracks the unixtime when the server booted (to track uptime even if the server migrates across nodes).
* Created Get->anvil_name_from_uuid() to easily convert an Anvil! UUID into a name. Also created ->host_uuid_from_name() to translate a host name into a host UUID.
* Created Server->get_runtime() that translates a server name into a process ID and then uses that to determine how long (in seconds) it has been running. This is used when a server transitions from 'shut off' to 'running' to determine exactly when the server booted (current time - runtime).
* Renamed all 'Server->parse_definition' calls that used 'from_memory' to 'from_virsh' to clarify the data source.
* Made scan-hardware smarter about RAM change alerts.
* Updated scancore to load agent strings on startup so that processing pending alerts works properly.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-10-02 02:13:34 -04:00
Digimer
4dfe0cb5a0 * Created Cluster->boot_server, ->shutdown_server and ->migrate_server methods that handle booting, migrating and shutting down servers. Also created the private method ->_set_server_constraint which is used by migrate and boot to set resource constraints to control where a server boots or migrates to.
* Did more work on parsing server data out of the CIB. There is still an issue with determining which node currently hosts a resource, however.
* Renamed Server->boot to ->boot_virsh, ->shutdown to ->shutdown_virsh and ->migrate to ->migrate_virsh to clarify that these methods work on the raw virsh calls, outside of pacemaker (indeed, they are what the pacemaker RA uses to do what pacemaker asks).
* Got more work done on the scan-cluster SA.
* Created the empty files for the pending scan-server SA.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-24 02:09:18 -04:00
Digimer
0f7267eae1 * Moved the '_host_name', '_short_host_name', and '_domain_name' private methods in Tools.pm over to Get.pm (removing the leading '_' in the method names).
* Created 'Cluster->which_node' that returns 'node1' or 'node2' to indicate which node a host is.
* Continued working on scan_cluster; decided to make it not host-dependent.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-09-20 00:27:36 -04:00
Digimer
b2c7fd95fb * Renamed the ScanCore unit file to scancore.
* Added support to parsing location contraints to Cluster->parse_cib

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-24 12:54:31 -04:00
Digimer
14bf323627 * Fixed an issue with ocf:alteeve:server where, after a migration, the target host would invoke the RA as if it was trying to migrate, instead of verifying the server (resource) was OK post migration.
* Fixed a bug in Server->get_status() where the call to Storage->rsync's returned output checked for '!!errer!!' instead of '!!error!!'.
* Fixed a bug in Storage->rsync where, when no port was passed in, it would try to specify an empty port and fail.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-20 06:32:19 -04:00
Digimer
1498e1b53c * Got server migration working using ocf:alteeve:server in a test environment!
* Converted most 'eval { }' calls to localize $@ and test the output of the eval, instead of checking to see if $@ was set.
* Converted all 'local' hash references to instead use the short host name of the local machine as a new standard.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-19 18:54:09 -04:00
Madison Kelly
30f2b3fa8e * Switched all hash 'local' keys to be the host's short user name. Untested, likely bugs to be fixed in the next commit.
Signed-off-by: Madison Kelly <mkelly@alteeve.ca>
2020-08-18 19:34:08 -04:00
Digimer
47203490a9 * Working on getting live migration to work with ocf:anvil:striker using the environment variables that pacemaker sets. Incomplete, but getting close.
* Added support to Cluster->parce_cib() to track if maintenance mode is set.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-17 11:55:49 -04:00
Digimer
cc1e0e2f77 * Updated ocf:alteeve:server to properly report a server's status when a monitor action is called.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-14 23:16:34 -04:00
Digimer
6eeb4e48c7 * Added 'timeout' and 'wfc-timeout' to drbd'd global-common.conf in DRBD->update_global_common().
* Fixed a minor bug in ocf:alteeve:server found in testing the RA.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-14 00:24:02 -04:00
Digimer
e35800c413 * Fixed up (though more testing/work needed) to ocf:alteeve:server to get it working with DRBD resources referenced using '/dev/drbd/by-res/...'.
* Added the anvil.conf option 'sys::privacy::strong' that controls if the Anvil! ever "calls home". Initially, this controls DRBD's usage flag.
* Updated DRBD->get_devices() to track resources by their 'by-res' names as well and by the normal '/dev/drbdX' devices.
* To mitigate https://bugzilla.redhat.com/show_bug.cgi?id=1868467, updated Get->bridges() to parse the normal (non-JSON) data if we get invalid JSON output.
* Updated anvil-join-anvil to not disable, and in fact enable, libvirtd on boot. With DRBD 9, the original fear of a user accidentally booting a VM that's running on the peer no longer is an issue. By enabling it and leaving it on, Striker dashboard users won't lose their virtual machine manager link unless the node powers off. Also enabled actually updating the job progress, completing this tool!

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-08-13 00:12:20 -04:00
Digimer
01974d7efe * Finished (though testing is needed) the updated ocf:alteeve:server resource agent. It now handles starting and stopping libvirtd and drbd daemons on-demand.
Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-16 00:33:00 -04:00
Digimer
dcd1fd1492 * Created Cluster->check_node_status() that checks the status of a node (in pacemaker).
* Created Cluster->get_peers() that figures out who the peer node (and DR host, if applicable) are.
* Updated Cluster->parse_cib() to dig out more information.
* Created Cluster->start_cluster() to start pacemaker (via pcsd) locally or on all (both) nodes.
* Started working on ocf:alteeve:server to start/stop the libvirtd/drbd daemons as needed, instead of having pacemaker do it.
* Got more work done on anvil-join-anvil. Node 2 now waits for the cluster to start, and node 1 will do setup as needed, then wait for the cluster to start as well.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-07-15 18:40:37 -04:00
Digimer
726a4374d1 * Renamed the database table 'host_keys' to 'ssh_keys' to better represent what it stores.
* Updated 'variables' -> 'variable_source_uuid' to type 'uuid' and removed the 'not null' constraint.
* Updated Database->insert_or_update_variables() to check/update 'variables_source_table' and 'variables_source_uuid'.
* Created the 'trusts' database table which will, when done, tell anvil-daemon which users@machines to trust (setup passwordkess SSH).
* Created (but not finished) System->manage_authorized_keys() and moved the logic over to it from anvil-daemon.
* Changed the host types "dashboard" to "striker".
* Moved the following methods from 'System' to 'Get';
** System->get_host_type to Get->host_type
** System->get_bridges to Get->bridges
** System->get_free_memory to Get->free_memory
** System->get_os_type to Get->os_type
** System->get_uptime to Get->uptime
* Updated striker to include the host_uuid for the 'node1', 'node2' and (if chosen) 'dr1' when running a job manifest.

Signed-off-by: Digimer <digimer@alteeve.ca>
2020-06-10 18:26:50 -04:00
Digimer
934c9b1286 * Updated logging to now log anything with 'priority' set to a new 'anvil.alert.log' file (while still also logging as normal to anvil.log). This should make it easier to watch for alert messages.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-22 21:17:30 -04:00
Digimer
7cdd2f60e9 * Created Network->download() to handle downloading a file on the local system. Created ->bridge_info() to parse 'bridge' output. Created ->load_ips() to load IP address information from the database (as opposed to ->get_ips() which queries a system).
* Created Database->insert_or_update_bridge_interfaces() to handle updating the new bridge_interfaces database table that records which network interfaces are connected to which bridges.
* Added 'bridge_mac' and 'bridge_mtu' to the 'bridges' table.
* Started Server->map_network which will, eventually, try to map MAC addresses to IPs and record server -> vnetX -> bridge data. Made getting a server status target-dependent.
* Worked on anvil-update-states to parse and record bridge data.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-12 00:32:32 -04:00
Digimer
3a86bed694 * Fixed tools/striker-initialize-host so that it set the hostname on the target, not locally.
* Updated System->host_name to work locally and on remote targets.
* Renamed all 'hostname' instances to 'host_name' to standardize on a spelling throughout the program.
* Removed use of and dependency on 'hostname'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-10-02 15:39:21 -04:00
Digimer
bc341809ca * Finished (for now) ocf:alteeve:server! It can boot, migrate and stop a server cleanly. It still checks to see if DRBD needs to be started and does so when needed, but it won't stop it anymore.
* Fixed a couple typos in tools/anvil-check-memory that prevented it from running.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-04 18:58:43 -04:00
Digimer
8a2c86d088 * Renamed striker-configure-host (back) to anvil-configure-host, and started updating it to work on any machine type.
* Created tools/anvil-check-memory to report how much RAM is used by a given program.
* Added documentation for some previously undocumented methods.
* Updated Database->archive_database() to take the 'tables' parameter.
* Updated Storage->scan_directory() to record a directory's mode and type, even when recursive isn't used.
* Finished System->check_memory().
* Updated ocf:alteeve:server to now NOT stop a DRBD resource unless 'stop_drbd_resources'.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-09-03 14:07:39 -04:00
Digimer
c0dd34334e * Fixed another bug in making ocf:alteeve:server work in pacemaker.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-21 01:14:10 -04:00
Digimer
ed2e83a1a4 * Fixed a few more bugs in 'ocf:alteeve:anvil', but it's still failing when invoked by pacemaker.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-20 23:46:59 -04:00
Digimer
f5caec52dc * Made DRBD->allow_two_primaries() smarter about finding the 'target_node_id' when it wasn't passed.
* Fixed a couple bugs, and now ocf:alteeve:server properly can pull and push servers between nodes.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-20 00:59:41 -04:00
Digimer
113a44ecc6 * Got 'migrate_to' working in ocf:alteeve:server. 'migrate_from' still needs work.
* Created DRBD->allow_two_primaries() and ->reload_defaults() that enables (and resets/disables) dual-primary operation (allow-two-primaries=yes), used to enable live migration.
* Created Remote->test_access() that simply verifies that a remote target can be accessed (as a given user).
* Created Server->migrate() that actually migrates a server. It can push already, and pull will be added next.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-16 01:41:47 -04:00
Digimer
7db542b9b0 * Fixed a bug where definition files that used '<source file='X'/>' instead of '<source dev='X'/>' for the backing block device for disks.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-14 01:48:36 -04:00
Digimer
873ed3e2b0 * Fixed some typo bugs.
* Added stop_drbd_resources() to ocf:alteeve:server.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-12 23:15:13 -04:00
Digimer
dff74102db * Create (but not yet tested) Server->shutdown() to, well, shutdown servers.
* Modified Server->boot() to now only work locally. Also updated it to optionally take the XML definition path for the server to boot.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-10 01:35:08 -04:00
Digimer
d224be9344 * Created DRBD->manage_resource() that allows for up/down/primary/secondary'ing a resource on a local or remote system.
* Created Server->boot() that starts a server and verifies that it did actually start.
* Got ocf:anvil:server smarter about starting DRBD resources, properly handing resources where auto-promote isn't enabled. The 'start' process is now complete (baring bugs).

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-08 22:58:21 -04:00
Digimer
b1ddf945e2 * Got ocf:alteeve:server working again to boot servers. It's now smarter, knowing when the server is running locally already (success), running on the other node (hard error) and running on DR (fatal error).
* Updated DRBD->get_devices() to store data all under 'drbd::config::<host>::x'.
* Created Server->find() that takes a target and collects the servers running on it.
* Updated System->check_storage() to redirect all calls STDERR to /dev/null to supress errors about failing to open /dev/drbdX when LVM's filter isn't setup.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-08 01:10:38 -04:00
Digimer
324ef351fe * Updated DRBD->get_devices() to properly identify the peer node, when run on an actual node in the cluster (not DR or Striker).
* Created System->active_lv() that, surprise, activates an inactive logical volume. Also created ->check_storage() that parses out the LVM data.
* Fixed a bug in tools/fence_pacemaker that was preventing it from compiling and running.
* Updated ocf:alteeve:server to validate the target server's storage.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-06 23:31:35 -04:00
Digimer
16f79ca244 * Created System->get_bridges() that gets a list of bridges (and connected interfaces, and data). Also created ->get_free_memory() that returns the amount of available RAM.
* Got ocf:alteeve:server validating bridges.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-03 01:38:12 -04:00
Digimer
4a93682447 * Started rebuilding ocf:alteeve:server using the new module methods.
Signed-off-by: Digimer <digimer@alteeve.ca>
2019-08-02 00:06:06 -04:00
Digimer
7a7e3db0c1 * Created DRBD->get_devices() that finds and maps the /dev/drbdX devices to their resources and backing LVs.
* Got Server->get_status() and ->_parse_definition() pulling out all but the device bus data from the XML files. Under devices, started parsing, with 'channel' devices now being parsed.
* Fixed a bug in several Storage->X calls where the test to see if a 'target' was the local host or not wasn't smart enough.
* Purged 'ocs:alteeve:server' to prepare for the rewrite where a lot of the logic is being moved into the Anvil::Tools modules as their functionality will be needed elsewhere anyway.

Signed-off-by: Digimer <digimer@alteeve.ca>
2019-07-30 02:10:04 -04:00