Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding socket connection issues #554

Open
dlanderson opened this issue Aug 26, 2020 · 2 comments
Open

Understanding socket connection issues #554

dlanderson opened this issue Aug 26, 2020 · 2 comments
Assignees

Comments

@dlanderson
Copy link
Contributor

What

We'd like to track socket connection issues with the core agent so we can identify issues and fix them proactively.

Why

The socket connection might fail for multiple reasons, which might include

  • Unable to create the core_agent_dir
  • Unable to download (e.g outbound firewall prevents it)
  • Permission denied to an already existing core_agent_socket path
  • noexec is set on the volume where core_agent_dir resides
  • Random things like core_agent_dir resides on a bind mount inside a docker container on an OSX host and creating a socket results in a name too long error. Can't run swarm tests in Docker for Mac moby/moby#23545 (comment)

No matter the issue, the end result is that the python agent will attempt to connect to the socket, fail, and ultimately stop trying after a few attempts. We'll start by tracking when the connection attempts are exhausted.

How

  1. At https://github.com/scoutapp/scout_apm_python/blob/master/src/scout_apm/core/socket.py#L190
  2. Report the following data in a POST to https://checkin.scoutapp.com/apps/diagnostics.scout
  3. With URL parameters: ?key=#{org_key}&name=#{app_name}
  4. With Headers:
Agent-Hostname: #{socket.gethostbyname()}
Agent-Version: #{agent version}
Content-Type: application/octet-stream
  1. With body data:
{
  version: 1,
  type: socket,
  language: python,
  agent_version: ${agent_version},
  environment: #{app_environment},
  node: #{socket.gethostbyname()},
  time_since_startup: #{figure out how to determine this},
  agent_config: #{json object/hash of key/value of the agent's config options}
}
@adamchainz
Copy link
Contributor

👍

@adamchainz
Copy link
Contributor

The current code setup means that the CoreAgentSocketThread won't be involved in any of the errors you listed there, so it's not the right place to add this logging.

The right place is inside CoreAgentManager.

#482 tracks moving the download under the thread, to avoid blocking application startup. I will look into that as a follow-up of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants