You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've used your example code to add 593 of our hosts to the monitor tool, since it basically should do what I need (I need a statistics every 60 seconds which I would want to log away).
Of those 593 hosts only 255 are ever reported in the ticker (always 255, always the first 255 I add as targets). It makes no difference weather I use AddTarget or AddTargetDelayed, it's always 255.
Hosts which are certainly available are being reported as PacketLoss 100 %, while Wireshark sees the successful UDP packages:
The monitor reports the following: 2020/12/03 14:14:17 hanco: {PacketsSent:10 PacketsLost:10 Best:0 Worst:0 Median:NaN Mean:NaN StdDev:NaN}
Which is certainly the host and it can be pinged:
It is certainly true that several of the hosts are currently "down" because behind Firewalls and I'm not testing in the same zone, but I'd expect it to report the loss at the "right" hosts ... and all 593 instead of just 255 ofc ;-).
Current code, which pretty much reflects the example with some little changes:
package slaping
import (
"net"
"os"
"time"
"log"
"os/signal"
"syscall"
"github.com/digineo/go-ping"
"github.com/digineo/go-ping/monitor"
)
var (
pingInterval = 5 * time.Second
pingTimeout = 4 * time.Second
reportInterval = 60 * time.Second
size uint = 56
pinger *ping.Pinger
targets []string
)
func (p *SLAPing) PingHosts() {
// setup, we need to populate cache once before we start to actually have host and maint info
CMDBHosts, err := p.CMDB.GetSLAHosts()
if err != nil {
log.Println(err)
}
// Bind to sockets
if pi, err := ping.New("0.0.0.0", "::"); err != nil {
log.Printf("Unable to bind: %s\nRunning as root?\n", err)
os.Exit(2)
} else {
pinger = pi
}
pinger.SetPayloadSize(uint16(size))
defer pinger.Close()
// Create monitor
monitor := monitor.New(pinger, pingInterval, pingTimeout)
defer monitor.Stop()
// Add targets
targets := CMDBHosts
validtargets := 0
for i, target := range CMDBHosts {
ipAddr, err := net.ResolveIPAddr("", target)
if err != nil {
// currently irrelevant
continue
}
err = monitor.AddTarget(string([]byte{byte(i)}), *ipAddr)
//err = monitor.AddTargetDelayed(string([]byte{byte(i)}), *ipAddr, 10*time.Millisecond*time.Duration(i))
if err != nil {
// currently irrelevant
continue
}
validtargets = validtargets + 1
}
log.Printf("Total targets: %d, Valid targets: %d, Errors: %d\n", len(targets), validtargets, len(targets)-validtargets)
// Start report routine
ticker := time.NewTicker(reportInterval)
defer ticker.Stop()
go func() {
for range ticker.C {
for i, metrics := range monitor.ExportAndClear() {
log.Printf("%s: %+v\n", targets[[]byte(i)[0]], *metrics)
}
}
}()
// Handle SIGINT and SIGTERM.
// in other words we wait here until interrupted and let the ticker tick
ch := make(chan os.Signal, 1)
signal.Notify(ch, syscall.SIGINT, syscall.SIGTERM)
log.Println("received", <-ch)
}
[edit]
My initial thought was that it may be connected to the ICMP pings getting blocked by firewalls and threads are getting locked up due to the timeouts not properly working there, since it's an uncommon case and may not be that well tested.
Now, due to the result "shuffle" I'm not sure but I have systems in the results which actually are blocked by firewalls, going through the slice sequentially that could explain why it's always the same number, but I didn't see any max threads number to get locked in the library, is there a limit where the channel / parallel work of the library blocking?
[edit2]
I could test it now on a host which has the necessary firewall permissions to actually run this without getting blocked, the results are still the same. Something is wrong here, just can't make out what.
The text was updated successfully, but these errors were encountered:
Rewrote the code for now,, but it seems to "stall" overall. The issue is I can't find a way to set a timeout, and having 300/600 hosts behind a firewall it probably will wait forever.
In the monitor, AddTarget/AddTargetDelayed seeems to use p.timeout, but I don't see a way to set it, maybe I just overlooked that and it's there ( I really couldn't see it in that branch, if it's there sorry). But to test that I need a way to set the timeout.
Hi,
I've used your example code to add 593 of our hosts to the monitor tool, since it basically should do what I need (I need a statistics every 60 seconds which I would want to log away).
Of those 593 hosts only 255 are ever reported in the ticker (always 255, always the first 255 I add as targets). It makes no difference weather I use AddTarget or AddTargetDelayed, it's always 255.
Hosts which are certainly available are being reported as PacketLoss 100 %, while Wireshark sees the successful UDP packages:
The monitor reports the following:
2020/12/03 14:14:17 hanco: {PacketsSent:10 PacketsLost:10 Best:0 Worst:0 Median:NaN Mean:NaN StdDev:NaN}
Which is certainly the host and it can be pinged:
It is certainly true that several of the hosts are currently "down" because behind Firewalls and I'm not testing in the same zone, but I'd expect it to report the loss at the "right" hosts ... and all 593 instead of just 255 ofc ;-).
Current code, which pretty much reflects the example with some little changes:
[edit]
My initial thought was that it may be connected to the ICMP pings getting blocked by firewalls and threads are getting locked up due to the timeouts not properly working there, since it's an uncommon case and may not be that well tested.
Now, due to the result "shuffle" I'm not sure but I have systems in the results which actually are blocked by firewalls, going through the slice sequentially that could explain why it's always the same number, but I didn't see any max threads number to get locked in the library, is there a limit where the channel / parallel work of the library blocking?
[edit2]
I could test it now on a host which has the necessary firewall permissions to actually run this without getting blocked, the results are still the same. Something is wrong here, just can't make out what.
The text was updated successfully, but these errors were encountered: