Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker Checker (UDP): Don't stop on the first fail #1040

Closed
Tracked by #669
josecelano opened this issue Sep 12, 2024 · 5 comments · Fixed by #1064
Closed
Tracked by #669

Tracker Checker (UDP): Don't stop on the first fail #1040

josecelano opened this issue Sep 12, 2024 · 5 comments · Fixed by #1064
Assignees
Labels

Comments

@josecelano
Copy link
Member

josecelano commented Sep 12, 2024

Parent issue: #669

When you run the checker for multiple UDP trackers, it stops when the first tracker check fails.

How to reproduce

Run the checker without running the tracker:

TORRUST_CHECKER_CONFIG='{
    "udp_trackers": [
	"127.0.0.1:6969",
	"127.0.0.1:6969/",
	"127.0.0.1:6969/announce",
	"localhost:6969",
	"localhost:6969/",
	"localhost:6969/announce",
	"udp://127.0.0.1:6969",
	"udp://127.0.0.1:6969/",
	"udp://127.0.0.1:6969/announce",
	"udp://localhost:6969",
	"udp://localhost:6969/",
	"udp://localhost:6969/announce"
    ],
    "http_trackers": [],
    "health_checks": []
}' cargo run --bin tracker_checker
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.12s
     Running `target/debug/tracker_checker`
2024-09-12T15:22:55.457880Z  INFO torrust_tracker::console::clients::checker::service: Running checks for trackers ...
[
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  }
]
@josecelano josecelano mentioned this issue Sep 12, 2024
21 tasks
@josecelano josecelano changed the title Tracker Checker: Don't stop on the first fail Tracker Checker (UDP): Don't stop on the first fail Sep 12, 2024
@abstralexis
Copy link
Contributor

abstralexis commented Oct 16, 2024

I've noticed here that it is using ?, which would be early-returning from the Service::run_checks.

while let Some(results) = checks.join_next().await {
check_results.append(&mut results?);
}

I wouldn't mind looking more into this issue. I suspect it involves writing a custom serializer/deserializer for Result<T, JoinError> and just passing results instead of results?; could look similar to this StackOverflow answer.
A dirty fix perhaps, but having an Option<JoinError> as some sort of flag whether the join errored could be an option. This lets us continue the while loop without exiting.

@abstralexis
Copy link
Contributor

Replacing the above with

while let Some(results) = checks.join_next().await {
    match results {
        Ok(mut r) => check_results.append(&mut r),
        Err(_) => (),
    };
}

Produces an identical output. Means it probably wasn't the issue of results?. Do you know where "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable." would be coming from?

@josecelano
Copy link
Member Author

I've noticed here that it is using ?, which would be early-returning from the Service::run_checks.

while let Some(results) = checks.join_next().await {
check_results.append(&mut results?);
}

I wouldn't mind looking more into this issue. I suspect it involves writing a custom serializer/deserializer for Result<T, JoinError> and just passing results instead of results?; could look similar to this StackOverflow answer.
A dirty fix perhaps, but having an Option<JoinError> as some sort of flag whether the join errored could be an option. This lets us continue the while loop without exiting.

Hi @abstralexis, I need some time to review that part of the code because it was done by @da2ce7 and I don't know it. That error comes from the UdpTrackerClient.

@abstralexis
Copy link
Contributor

Thanks - I traced where perhaps the error would end up if being sent from there during checks - should these continue instead of break?

let client = match Client::new(remote_addr, timeout).await {
Ok(client) => {
checks.results.push((Check::Setup, Ok(())));
client
}
Err(err) => {
checks.results.push((Check::Setup, Err(err)));
results.push(Err(checks));
break;
}
};
let transaction_id = TransactionId::new(1);
// Connect Remote
let connection_id = match client.send_connection_request(transaction_id).await {
Ok(connection_id) => {
checks.results.push((Check::Connect, Ok(())));
connection_id
}
Err(err) => {
checks.results.push((Check::Connect, Err(err)));
results.push(Err(checks));
break;
}
};

I think these would be causing the loop to exit completely, rather than check the next one. I will try to confirm later.

@abstralexis
Copy link
Contributor

Yep, I believe this was it. I now get:

[
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  },
  {
    "Udp": {
      "Err": {
        "remote_addr": "127.0.0.1:6969",
        "results": [
          [
            "Setup",
            {
              "Ok": null
            }
          ],
          [
            "Connect",
            {
              "Err": "Failed to receive a connect response, with error: Timeout while waiting for the socket to become readable."
            }
          ]
        ]
      }
    }
  }
]

Will submit a PR.

abstralexis added a commit to abstralexis/torrust-tracker that referenced this issue Oct 17, 2024
@josecelano josecelano linked a pull request Oct 17, 2024 that will close this issue
josecelano added a commit that referenced this issue Oct 17, 2024
a34f66e Fix #1040: `continue` when finding errors (abstralexis)

Pull request description:

  Fix the behaviour of UDP checks exiting on first fail (#1040). Was `break`ing instead of `continue`ing when iterating over each test.

ACKs for top commit:
  josecelano:
    ACK a34f66e

Tree-SHA512: 616c5671f2e6912fb7992794ca634d1402e7d4e712a5dbb5dab498ac67bbf622220f241d84f8ebc3129ba284dca587040a5cca2989023979d2068e458e5382e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants