Spotted 1 bug, code defect causing measurable CPU waste that scales with server load.
(why link libstdc++ in a go binary anyway?)
daemon consumes sustained CPU on otherwise idle servers. Measured across two independent servers:
server a 5.5% ~30h 72h 2.4× realtime
server b 22.1% ~30h 366h 12.2× realtime
The fact that impact scales proportionally with server load (audit.log write frequency) is itself diagnostic — it points directly to a read loop driven by incoming events rather than a fixed-cost background operation.
binary opens an inotify file descriptor (fd=19) in O_NONBLOCK mode and relies on the Go runtime's netpoller to park the goroutine between events. The netpoller uses epoll internally.
However, inotify file descriptors have specific epoll semantics that this version of the Go runtime cannot satisfy — every epoll_ctl registration attempt fails.
$ strace -p 1268509 -e trace=epoll_wait,epoll_ctl -c -- sleep 10 %
time seconds usecs/call calls errors syscall
100.00 0.000287 5 55 55 epoll_ctl <- 100% error rate
0.00 0.000000 0 0 0 epoll_wait <- never called
epoll_wait is never invoked. The goroutine cannot park. As a result, it immediately retries the read after every EAGAIN, producing a tight spin loop.
The Go scheduler fires SIGURG signals continuously to preempt the spinning goroutine — which simply reschedules and spins again.
Proof — strace read trace:
read(19, "\xc8\x02\x00\x00\x02\x00\x00\x00\x00\x00\x00\ x00\ x00\x00\x00\x00", 65536) = 16
# ^ inotify IN_WRITE event consumed (16 bytes = one inotify_event struct, no filename)
read(19, 0xc0005c7e53, 65536) = -1 EAGAIN ← fd drained, goroutine should park here
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1268509, si_uid=0} ---
--- SIGURG ... --- (×30+ more — Go scheduler attempting preemption)
read(19, "\xc8\x02\x00\x00\x02\x00...", 65536) = 16 ← next event, loop repeats
read(19, 0xc0005c7e53, 65536) = -1 EAGAIN
...
The decoded inotify event (wd=0x2c8, mask=0x2 = IN_WRITE, cookie=0, name_len=0) shows the mechanism is correctly receiving write notifications on audit.log — the event detection works fine. The failure is entirely in the goroutine's inability to park between events.
The standard Go pattern for inotify bypasses the netpoller entirely by using a blocking fd on a dedicated OS thread.
maybe it should a) use blocking mode, OS parks the thread, no go scheduler involvement
or migrate it to fsnotify ?
by the way, on imunify-agent-proxy you ship debug symbols in production and it's not stripped.
(why link libstdc++ in a go binary anyway?)
daemon consumes sustained CPU on otherwise idle servers. Measured across two independent servers:
server a 5.5% ~30h 72h 2.4× realtime
server b 22.1% ~30h 366h 12.2× realtime
The fact that impact scales proportionally with server load (audit.log write frequency) is itself diagnostic — it points directly to a read loop driven by incoming events rather than a fixed-cost background operation.
binary opens an inotify file descriptor (fd=19) in O_NONBLOCK mode and relies on the Go runtime's netpoller to park the goroutine between events. The netpoller uses epoll internally.
However, inotify file descriptors have specific epoll semantics that this version of the Go runtime cannot satisfy — every epoll_ctl registration attempt fails.
$ strace -p 1268509 -e trace=epoll_wait,epoll_ctl -c -- sleep 10 %
time seconds usecs/call calls errors syscall
100.00 0.000287 5 55 55 epoll_ctl <- 100% error rate
0.00 0.000000 0 0 0 epoll_wait <- never called
epoll_wait is never invoked. The goroutine cannot park. As a result, it immediately retries the read after every EAGAIN, producing a tight spin loop.
The Go scheduler fires SIGURG signals continuously to preempt the spinning goroutine — which simply reschedules and spins again.
Proof — strace read trace:
read(19, "\xc8\x02\x00\x00\x02\x00\x00\x00\x00\x00\x00\ x00\ x00\x00\x00\x00", 65536) = 16
# ^ inotify IN_WRITE event consumed (16 bytes = one inotify_event struct, no filename)
read(19, 0xc0005c7e53, 65536) = -1 EAGAIN ← fd drained, goroutine should park here
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1268509, si_uid=0} ---
--- SIGURG ... --- (×30+ more — Go scheduler attempting preemption)
read(19, "\xc8\x02\x00\x00\x02\x00...", 65536) = 16 ← next event, loop repeats
read(19, 0xc0005c7e53, 65536) = -1 EAGAIN
...
The decoded inotify event (wd=0x2c8, mask=0x2 = IN_WRITE, cookie=0, name_len=0) shows the mechanism is correctly receiving write notifications on audit.log — the event detection works fine. The failure is entirely in the goroutine's inability to park between events.
The standard Go pattern for inotify bypasses the netpoller entirely by using a blocking fd on a dedicated OS thread.
maybe it should a) use blocking mode, OS parks the thread, no go scheduler involvement
or migrate it to fsnotify ?
by the way, on imunify-agent-proxy you ship debug symbols in production and it's not stripped.
Comment