Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Raft Cluster Management with Health Checks, Dynamic Peer Management, and Security #642

Open
4 of 16 tasks
sinadarbouy opened this issue Dec 20, 2024 · 1 comment
Open
4 of 16 tasks
Assignees
Labels
epic To be broken down into multiple tasks
Milestone

Comments

@sinadarbouy
Copy link
Collaborator

sinadarbouy commented Dec 20, 2024

Description:

Our current Raft implementation needs improvements. The following features need to be implemented:

  1. Raft Health Check Integration
    • Add Raft-specific health checks to the existing health check endpoint
    • Include leader election status in health checks
    • Add cluster state validation in health checks
    • Expose metrics about Raft cluster health
  2. Dynamic Peer Management
    • Implement gRPC endpoints for peer management:
      • AddPeer endpoint for adding new nodes to the cluster
      • RemovePeer endpoint for graceful node removal
      • Status endpoint to get current cluster membership
    • Add validation to ensure only leader nodes can modify cluster membership
    • Implement retry mechanism for failed peer additions
    • Add logging and monitoring for peer management operations
  3. Scale Management
    • Implement automated peer discovery during scale-up
    • Add graceful shutdown procedure during scale-down
  4. Security Improvements
    • Implement mTLS for gRPC communication between nodes
    • Implement token-based authentication for cluster management operations
    • Add audit logging for all cluster membership changes

Technical Considerations

  • The health check should indicate if the node is part of a stable cluster
  • Only the leader should be able to modify cluster membership
  • Authentication should be required for all cluster management operations
  • Scale operations should maintain cluster consistency
@mostafa
Copy link
Member

mostafa commented Dec 22, 2024

@sinadarbouy All the raft functions in the Raft library has a WithLibrary variant that can be used to pass a custom logger to prevent separate log formats in the output. We also have a hc_log_adapter interface that can translate between hclog and zerolog, which is also used in the plugins.

@mostafa mostafa moved this from ✨ New to 📋 Backlog in GatewayD Core Public Roadmap Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic To be broken down into multiple tasks
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants