anvil/scancore-agents/scan-ipmitool/scan-ipmitool.xml

121 lines
7.1 KiB
XML
Raw Normal View History

<?xml version="1.0" encoding="UTF-8"?>
<!--
Company: Alteeve's Niche, Inc.
License: GPL v2+
Author: Madison Kelly <mkelly@alteeve.ca>
NOTE: All string keys MUST be prefixed with the agent name! ie: 'scan_ipmitool_log_0001'.
-->
<words>
<meta version="3.0.0" languages="en_CA,jp"/>
<!-- Canadian English -->
<language name="en_CA" long_name="Canadian English" description="ScanCore scan agent that monitors sensors available via IPMI BMCs.">
<!-- Messages entries -->
<key name="scan_ipmitool_message_0001">No IPMI BMC found on this host nor where any other machines with IPMI found or where accessible. Nothing to do.</key>
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
<key name="scan_ipmitool_message_0002">There was no IPMI sensor value units set for sensor: [#!variable!sensor!#] on the machine: [#!variable!host_name!#].</key>
<key name="scan_ipmitool_message_0003">There was no IPMI sensor value set for sensor: [#!variable!sensor!#] on the machine: [#!variable!host_name!#].</key>
<key name="scan_ipmitool_message_0004">
The sensor: [#!variable!sensor_name!#] has changed.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
</key>
<key name="scan_ipmitool_message_0005">
The sensor: [#!variable!sensor_name!#] has changed.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
- [#!variable!old_sensor_status!#] -> [#!variable!new_sensor_status!#]
- Thresholds:
- High critical: [#!variable!old_high_critical!#] -> [#!variable!new_high_critical!#]
- High warning: [#!variable!old_high_warning!#] -> [#!variable!new_high_warning!#]
- Low warning: [#!variable!old_low_warning!#] -> [#!variable!new_low_warning!#]
- Low critical: [#!variable!old_low_critical!#] -> [#!variable!new_low_critical!#]
</key>
<key name="scan_ipmitool_message_0006">
The temperature sensor: [#!variable!sensor_name!#] has gone critically high!
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If enough sensors go critical, this node will withdraw and power off!
</key>
<key name="scan_ipmitool_message_0007">
The temperature sensor: [#!variable!sensor_name!#] has entered a high warning state!
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If both nodes have enough thermal sensors go into 'warning' state, and
if load shedding is enabled, a node will power off to reduce thermal
output. If enough sensors reach critical levels, the node will withdraw
and power off.
</key>
<key name="scan_ipmitool_message_0008">
The temperature sensor: [#!variable!sensor_name!#] has returned to normal levels.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
</key>
<key name="scan_ipmitool_message_0009">
The temperature sensor: [#!variable!sensor_name!#] has risen above critically low levels, but it is still in a warning state.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If you are listening to 'critical' level alerts only, you will not get the alert telling you when the temperature is back to normal.
</key>
<key name="scan_ipmitool_message_0010">
The temperature sensor: [#!variable!sensor_name!#] has jumped a large amount in a short period of time!
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
</key>
<key name="scan_ipmitool_message_0011">
The temperature sensor: [#!variable!sensor_name!#] has gone critically low!
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If enough sensors go critical, this node will withdraw and power off!
</key>
<key name="scan_ipmitool_message_0012">
The temperature sensor: [#!variable!sensor_name!#] has entered a low warning state!
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If the temperature continues to drop, the sensor will go critical. If enough sensors go critical, the node will withdraw and power off.
</key>
<key name="scan_ipmitool_message_0013">
The temperature sensor: [#!variable!sensor_name!#] has returned to normal levels.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
</key>
<key name="scan_ipmitool_message_0014">
The temperature sensor: [#!variable!sensor_name!#] has risen blow critically high levels, but it is still in a warning state.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
Note: If you are listening to 'critical' level alerts only, you will not get the alert telling you when the temperature is back to normal.
</key>
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
<key name="scan_ipmitool_message_0015">There was no IPMI sensor value units set for sensor: [#!variable!sensor!#] on the machine: [#!variable!host_name!#].</key>
<key name="scan_ipmitool_message_0016">
The sensor: [#!variable!sensor_name!#] has changed.
- [#!variable!old_sensor_value!#] -> [#!variable!new_sensor_value!#]
- [#!variable!old_sensor_status!#] -> [#!variable!new_sensor_status!#]
- Thresholds:
- High critical: [#!variable!old_high_critical!#] -> [#!variable!new_high_critical!#]
- High warning: [#!variable!old_high_warning!#] -> [#!variable!new_high_warning!#]
- Low warning: [#!variable!old_low_warning!#] -> [#!variable!new_low_warning!#]
- Low critical: [#!variable!old_low_critical!#] -> [#!variable!new_low_critical!#]
</key>
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
<key name="scan_ipmitool_message_0017">There was no IPMI sensor value units set for sensor: [#!variable!sensor!#] on the machine: [#!variable!host_name!#].</key>
<key name="scan_ipmitool_message_0018">There was no IPMI sensor value set for sensor: [#!variable!sensor!#] on the machine: [#!variable!host_name!#].</key>
<key name="scan_ipmitool_message_0019">
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
The new sensor: [#!variable!sensor_name!#] has been found on the machine: [#!variable!host_name!#].
- Value: [#!variable!sensor_value!#], Status: [#!variable!sensor_status!#]
- Thresholds:
- High critical: [#!variable!high_critical!#]
- High warning: [#!variable!high_warning!#]
- Low warning: [#!variable!low_warning!#]
- Low critical: [#!variable!low_critical!#]
</key>
<key name="scan_ipmitool_message_0020">
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
The new sensor: [#!variable!sensor_name!#] has been found on the machine: [#!variable!host_name!#].
Warning: It is not in an OK state!
- Value: [#!variable!sensor_value!#], Status: [#!variable!sensor_status!#]
- Thresholds:
- High critical: [#!variable!high_critical!#]
- High warning: [#!variable!high_warning!#]
- Low warning: [#!variable!low_warning!#]
- Low critical: [#!variable!low_critical!#]
</key>
<!-- Log entries -->
* Created Convert->fence_ipmilan_to_ipmitool() that takes a 'fence_ipmilan' call and converts it into a direct 'ipmitool' call. * Created Database->get_power() that loads data from the special 'power' table. * Fixed a bug in calls to Network->ping() where some weren't formatted properly for receiving two string variables. * Updated Database->get_anvils() to record the machine types when recording host information. * Updated Database->get_hosts_info() to also load the 'host_ipmi' column. * Updated Database->get_upses() to store the link to the 'power' -> 'power_uuid', when available. * Created ScanCore->call_scan_agents() that does the work of actually calling scan agents, moving the logic out from the scancore daemon. * Created ScanCore->check_power() that takes a host and the anvil it is in and returns if it's on batteries or not. If it is, the time on batteries and estimate hold-up time is returned. If not, the highest charge percentage is returned. * Created ScanCore->post_scan_analysis() that is a wrapper for calling the new ->post_scan_analysis_dr(), ->post_scan_analysis_node() and ->post_scan_analysis_striker(). Of which, _dr and _node are still empty, but _striker is complete. ** ->post_scan_analysis_striker() is complete. It now boots a node after a power loss if the UPSes powering it are OK (at least one has mains power, and the main-powered UPS(es) have reached the minimum charge percentage). If it's thermal, IPMI is called and so long as at least one thermal sensor is found and it/they are all OK, it is booted. For now, M2's thermal reboot delay logic hasn't been replicated, as it added a lot of complexity and didn't prove practically useful. * Created System->collect_ipmi_data() and moved 'scan_ipmitool's ipmitool call and parse into that method. This was done to allow ScanCore->post_scan_analysis_striker() to also call IPMI on a remote machine during thermal down events without reimplementing the logic. * Updated scan-ipmitool to only record temperature data for data collected locally. Also renamed 'machine' variables and hash keys to 'host_name' to clarify what is being stored. * Updated scancore to clear the 'system::stop_reason' variable. * Added missing packages to striker-manage-install-target. Signed-off-by: Digimer <digimer@alteeve.ca>
2020-12-21 21:00:35 +00:00
<key name="scan_ipmitool_log_0001">Starting to read the IPMI sensor values for: [#!variable!host_name!#]</key>
<key name="scan_ipmitool_log_0002">Failed to query node: [#!variable!host_name!#]'s IPMI interface using the call: [#!variable!call!#]. Is the password correct?</key>
<key name="scan_ipmitool_log_0003">IPMI sensor values read from: [#!variable!host_name!#] in: [#!variable!time!#].</key>
<key name="scan_ipmitool_log_0004">The sensor named: [#!variable!sensor_name!#] appears to have vanished, but this is the first scan that it vanished. This is generally harmless and just a sensor read issue.</key>
<key name="scan_ipmitool_log_0005">The sensor named: [#!variable!sensor_name!#] has returned.</key>
</language>
</words>