-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure: TestStorageTieringContinueInxMigration.test_put_object_limit_lt
#246
Comments
For more context, I see hundreds of messages like this in the log: {
"log_category": "database",
"log_level": "error",
"log_message": "Could not find a delay rule with id [10356].",
"request_api_name": "",
"request_api_number": 20013,
"request_api_version": "d",
"request_client_user": "rods",
"request_host": "172.25.0.3",
"request_proxy_user": "rods",
"request_release_version": "rods4.3.1",
"server_host": "2d81d8c90eae",
"server_pid": 2505,
"server_timestamp": "2023-12-19T22:54:19.628Z",
"server_type": "agent",
"server_zone": "tempZone"
}
{
"log_category": "api",
"log_level": "error",
"log_message": "Could not get delay rule information [rule id=[10356]]",
"request_api_name": "",
"request_api_number": 20013,
"request_api_version": "d",
"request_client_user": "rods",
"request_host": "172.25.0.3",
"request_proxy_user": "rods",
"request_release_version": "rods4.3.1",
"server_host": "2d81d8c90eae",
"server_pid": 2505,
"server_timestamp": "2023-12-19T22:54:19.628Z",
"server_type": "agent",
"server_zone": "tempZone"
} |
wait_for_empty_queue now allows the caller to specify a timeout in seconds as well as a function to run when a timeout occurs. If None is given for the function to run when a timeout occurs, the function raises a TimeoutError Exception. Existing uses of this function pass self.fail() to run on timeout with a brief message explaining the failure.
wait_for_empty_queue now allows the caller to specify a timeout in seconds as well as a function to run when a timeout occurs. If None is given for the function to run when a timeout occurs, the function raises a TimeoutError Exception. Existing uses of this function pass self.fail() to run on timeout with a brief message explaining the failure.
wait_for_empty_queue now allows the caller to specify a timeout in seconds as well as a function to run when a timeout occurs. If None is given for the function to run when a timeout occurs, the function raises a TimeoutError Exception. Existing uses of this function pass self.fail() to run on timeout with a brief message explaining the failure.
wait_for_empty_queue now allows the caller to specify a timeout in seconds as well as a function to run when a timeout occurs. If None is given for the function to run when a timeout occurs, the function raises a TimeoutError Exception. Existing uses of this function pass self.fail() to run on timeout with a brief message explaining the failure.
Test has been updated such that it will eventually fail now. Still seeing intermittent failures, so the issue is real. |
Saw this test fail and am posting some findings here... I saw an agent crash with signal 11, similar to what was described in #193, but not the same stacktrace. Here are the log messages with references to the PID of the agent which crashed: {
"log_category": "legacy",
"log_level": "error",
"log_message": "apply_policy_for_tier_group :: no resources found for group [example_group_g2]",
"request_api_name": "EXEC_RULE_EXPRESSION_AN",
"request_api_number": 1206,
"request_api_version": "d",
"request_client_user": "rods",
"request_host": "192.168.96.3",
"request_proxy_user": "rods",
"request_release_version": "rods4.3.3",
"server_host": "202cf5def270",
"server_pid": 19597,
"server_timestamp": "2024-12-11T15:07:03.711Z",
"server_type": "agent",
"server_zone": "tempZone"
}
{
"log_category": "agent_factory",
"log_level": "error",
"log_message": "Agent process [19597] exited with status [11].",
"server_host": "202cf5def270",
"server_pid": 19249,
"server_timestamp": "2024-12-11T15:07:22.661Z",
"server_type": "agent_factory",
"server_zone": "tempZone"
}
{
"log_category": "server",
"log_level": "critical",
"log_message": " 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.3\n 1# 0x00007F8807D2D320 in /lib/x86_64-linux-gnu/libc.so.6\n 2# 0x00007F8803C148AB\n 3# 0x00007F8803C148B3\n 4# 0x00007F8803C148B3\n 5# 0x00007F8803C148B3\n 6# 0x00007F8803C148B3\n 7# 0x00007F8803C148B3\n 8# 0x00007F8803C148B3\n 9# 0x00007F8803C148B3\n10# 0x00007F8803C148B3\n11# 0x00007F8803C148B3\n12# 0x00007F8803C148B3\n13# 0x00007F8803C148C4\n14# 0x00007F8803C148B3\n15# 0x00007F8803C148B3\n16# 0x00007F8803C148B3\n17# 0x00007F8803C148B3\n18# 0x00007F8803C148B3\n19# 0x00007F8803C14875\n20# 0x00007F8803C029F5\n21# 0x00007F8803BF652C\n22# 0x00007F8803BFA291\n23# 0x00007F8803B8B15C\n24# 0x00007F8803BF0B0F\n25# 0x00007F8803BF0A64\n26# 0x00007F8803BF09D4\n27# 0x00007F8803BEFAB3\n28# 0x00007F880A648C3C in /lib/libirods_server.so.4.3.3\n29# std::__1::function<irods::error (std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MsParamArray*, irods::callback)>::operator()(std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MsParamArray*, irods::callback) const in /lib/libirods_server.so.4.3.3\n30# irods::pluggable_rule_engine<std::__1::tuple<> >::exec_rule_expression(std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MsParamArray*, irods::callback) in /lib/libirods_server.so.4.3.3\n31# irods::rule_engine_context_manager<std::__1::tuple<>, RuleExecInfo*, (irods::rule_execution_manager_pack)0>::exec_rule_expression(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, MsParamArray*) in /lib/libirods_server.so.4.3.3\n32# rsExecRuleExpression(RsComm*, ExecRuleExpression*) in /lib/libirods_server.so.4.3.3\n33# 0x00007F880AD8591B in /lib/libirods_server.so.4.3.3\n34# int std::__1::__invoke_void_return_wrapper<int, false>::__call<int (*&)(RsComm*, ExecRuleExpression*), RsComm*, ExecRuleExpression*>(int (*&)(RsComm*, ExecRuleExpression*), RsComm*&&, ExecRuleExpression*&&) in /lib/libirods_server.so.4.3.3\n35# 0x00007F880AD85877 in /lib/libirods_server.so.4.3.3\n36# std::__1::__function::__func<int (*)(RsComm*, ExecRuleExpression*), std::__1::allocator<int (*)(RsComm*, ExecRuleExpression*)>, int (RsComm*, ExecRuleExpression*)>::operator()(RsComm*&&, ExecRuleExpression*&&) in /lib/libirods_server.so.4.3.3\n37# 0x00007F880AC71D6F in /lib/libirods_server.so.4.3.3\n38# std::__1::function<int (RsComm*, ExecRuleExpression*)>::operator()(RsComm*, ExecRuleExpression*) const in /lib/libirods_server.so.4.3.3\n39# irods::api_call_adaptor<ExecRuleExpression*>::operator()(irods::plugin_context&, RsComm*, ExecRuleExpression*) in /lib/libirods_server.so.4.3.3\n40# 0x00007F880AC719D1 in /lib/libirods_server.so.4.3.3\n41# irods::error std::__1::__invoke_void_return_wrapper<irods::error, false>::__call<irods::api_call_adaptor<ExecRuleExpression*>&, irods::plugin_context&, RsComm*, ExecRuleExpression*>(irods::api_call_adaptor<ExecRuleExpression*>&, irods::plugin_context&, RsComm*&&, ExecRuleExpression*&&) in /lib/libirods_server.so.4.3.3\n42# 0x00007F880AC718DF in /lib/libirods_server.so.4.3.3\n43# std::__1::__function::__func<irods::api_call_adaptor<ExecRuleExpression*>, std::__1::allocator<irods::api_call_adaptor<ExecRuleExpression*> >, irods::error (irods::plugin_context&, RsComm*, ExecRuleExpression*)>::operator()(irods::plugin_context&, RsComm*&&, ExecRuleExpression*&&) in /lib/libirods_server.so.4.3.3\n44# 0x00007F880AC71E87 in /lib/libirods_server.so.4.3.3\n45# std::__1::function<irods::error (irods::plugin_context&, RsComm*, ExecRuleExpression*)>::operator()(irods::plugin_context&, RsComm*, ExecRuleExpression*) const in /lib/libirods_server.so.4.3.3\n46# int irods::api_entry::call_handler<ExecRuleExpression*>(RsComm*, ExecRuleExpression*) in /lib/libirods_server.so.4.3.3\n47# call_execRuleExpressionInp(irods::api_entry*, RsComm*, ExecRuleExpression*) in /lib/libirods_server.so.4.3.3\n48# rsApiHandler(RsComm*, int, BytesBuf*, BytesBuf*) in /lib/libirods_server.so.4.3.3\n49# readAndProcClientMsg(RsComm*, int) in /lib/libirods_server.so.4.3.3\n50# agentMain(RsComm*) in /lib/libirods_server.so.4.3.3\n51# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.3\n52# main::$_5::operator()() const at /irods_source/server/main_server/src/rodsServer.cpp:1320\n53# main at /irods_source/server/main_server/src/rodsServer.cpp:1387\n54# 0x00007F8807D121CA in /lib/x86_64-linux-gnu/libc.so.6\n55# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n56# _start in /usr/sbin/irodsServer\n",
"server_host": "202cf5def270",
"server_pid": 19248,
"server_timestamp": "2024-12-11T15:07:32.570Z",
"server_type": "server",
"server_zone": "tempZone",
"stacktrace_agent_pid": "19597",
"stacktrace_timestamp_epoch_milliseconds": "651",
"stacktrace_timestamp_epoch_seconds": "1733929642",
"stacktrace_timestamp_utc": "2024-12-11T15:07:22.651Z"
} The agent executes Shortly after, a stacktrace is logged. Here's what that looks like with the newlines actually expanded:
I thought I built with debugging symbols on, but, apparently not... Will try that again in a bit. I suspect that these issues may be resolved by taking a sweep over the codebase with ASAN. |
I have observed two modes of failure for
test_plugin_unified_storage_tiering.TestStorageTieringContinueInxMigration.test_put_object_limit_lt
as of commit 511cba8 (NOTE: This commit is not necessarily the commit which introduced the problem, just the earliest one with which I tested).We need to investigate whether this is due to the test or if this is a real issue. We at least need to ensure that the test will fail eventually by removing the infinite loop found here:
irods_capability_storage_tiering/packaging/test_plugin_unified_storage_tiering.py
Lines 162 to 171 in 511cba8
The text was updated successfully, but these errors were encountered: