imunify-auditd-log-reader bug

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chris
    Junior Member
    • Apr 2017
    • 21

    #1

    imunify-auditd-log-reader bug

    Spotted 1 bug, code defect causing measurable CPU waste that scales with server load.

    (why link libstdc++ in a go binary anyway?)


    daemon consumes sustained CPU on otherwise idle servers. Measured across two independent servers:
    server a 5.5% ~30h 72h 2.4× realtime
    server b 22.1% ~30h 366h 12.2× realtime

    The fact that impact scales proportionally with server load (audit.log write frequency) is itself diagnostic — it points directly to a read loop driven by incoming events rather than a fixed-cost background operation.

    binary opens an inotify file descriptor (fd=19) in O_NONBLOCK mode and relies on the Go runtime's netpoller to park the goroutine between events. The netpoller uses epoll internally.
    However, inotify file descriptors have specific epoll semantics that this version of the Go runtime cannot satisfy — every epoll_ctl registration attempt fails.


    $ strace -p 1268509 -e trace=epoll_wait,epoll_ctl -c -- sleep 10 %

    time seconds usecs/call calls errors syscall
    100.00 0.000287 5 55 55 epoll_ctl <- 100% error rate
    0.00 0.000000 0 0 0 epoll_wait <- never called


    epoll_wait is never invoked. The goroutine cannot park. As a result, it immediately retries the read after every EAGAIN, producing a tight spin loop.
    The Go scheduler fires SIGURG signals continuously to preempt the spinning goroutine — which simply reschedules and spins again.

    Proof — strace read trace:

    read(19, "\xc8\x02\x00\x00\x02\x00\x00\x00\x00\x00\x00\ x00\ x00\x00\x00\x00", 65536) = 16
    # ^ inotify IN_WRITE event consumed (16 bytes = one inotify_event struct, no filename)

    read(19, 0xc0005c7e53, 65536) = -1 EAGAIN ← fd drained, goroutine should park here
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1268509, si_uid=0} ---
    --- SIGURG ... --- (×30+ more — Go scheduler attempting preemption)
    read(19, "\xc8\x02\x00\x00\x02\x00...", 65536) = 16 ← next event, loop repeats
    read(19, 0xc0005c7e53, 65536) = -1 EAGAIN
    ...​


    The decoded inotify event (wd=0x2c8, mask=0x2 = IN_WRITE, cookie=0, name_len=0) shows the mechanism is correctly receiving write notifications on audit.log — the event detection works fine. The failure is entirely in the goroutine's inability to park between events.

    The standard Go pattern for inotify bypasses the netpoller entirely by using a blocking fd on a dedicated OS thread.

    maybe it should a) use blocking mode, OS parks the thread, no go scheduler involvement
    or migrate it to fsnotify ?


    by the way, on imunify-agent-proxy you ship debug symbols in production and it's not stripped.
  • akots
    Administrator
    • Mar 2023
    • 124

    #2
    Hello Chris,
    Thank you for the detailed write-up and for the time spent narrowing this down.

    We would like to move this to a support ticket so we can collect the necessary environment details and verify whether this is a separate issue or something already addressed to our team recently. Based on your report, the behaviour appears to be tied to imunify-auditd-log-reader activity under audit log load, so we would like to inspect it on a case-by-case basis.

    Please submit a ticket here: https://cloudlinux.zendesk.com/hc/en-us/requests/new
    Also, please attach the Imunify360 doctor key to it: https://cloudlinux.zendesk.com/hc/en...w-to-obtain-it

    Comment

    Working...