Announcement

Collapse
No announcement yet.

Extremely high load and many dovecot processes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extremely high load and many dovecot processes

    Hello, im a sysadmin, and i manage a server whit CLN 6.8 whith Cpanel build 60.
    Almost 2 weeks ago, the server (and anothers servers), began to have an extremely high load, BUT the server is not slow, it works as if it had no problem. However, the load can rise up to 2k and I start to see more than 700 dovecot processes. This start happend after the last 2 or 3 cpanel update. I cant see what is the problem.
    The server have 2483 clients

    Server specs:

    CPU:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 16
    On-line CPU(s) list: 0-15
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 2
    NUMA node(s): 2
    Vendor ID: AuthenticAMD
    CPU family: 21
    Model: 1
    Model name: AMD Opteron(tm) Processor 4280
    Stepping: 2
    CPU MHz: 2800.029
    BogoMIPS: 5599.62
    Virtualization: AMD-V
    L1d cache: 16K
    L1i cache: 64K
    L2 cache: 2048K
    L3 cache: 6144K
    NUMA node0 CPU(s): 0-7
    NUMA node1 CPU(s): 8-15
    MemTotal: 65787584 kB
    SwapCached: 30556 kB
    MemCommitted: 4713684992 kB

    RAM:
    64gb

    CloudLinux Server release 6.8

    Has anything like this happened to anyone?

    thanks

  • #2
    Hi,

    The load average value is quite ad thing to judge if server is overloaded... LA depends on number of processes in R or D states (running or waiting for some data). Here is what internet says:

    > /proc/loadavg
    >
    > The first three fields in this file are load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes. They are the same as the load average numbers given by uptime(1) and other programs. The fourth field consists of two numbers separated by a slash (/). The first of these is the number of currently executing kernel scheduling entities (processes, threads); this will be less than or equal to the number of CPUs. The value after the slash is the number of kernel scheduling entities that currently exist on the system. The fifth field is the PID of the process that was most recently created on the system.

    What actually are processes in R/D state? Are they still dovecot? What they are doing? You may catch them with something like:

    Code:
    ps axuwwww | egrep  D| R | grep -v grep | grep -v axuwwww
    Also, would like to see a op header when issue exists.

    Comment


    • #3
      I think Im finding out what the problem is.

      I see in TOP

      Tasks: 5197 total, 30 running, 5142 sleeping, 4 stopped, 21 zombie
      Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
      Cpu1 : 37.2%us, 20.2%sy, 0.0%ni, 19.9%id, 22.3%wa, 0.0%hi, 0.3%si, 0.0%st
      Cpu2 : 54.9%us, 13.4%sy, 0.0%ni, 21.5%id, 9.9%wa, 0.0%hi, 0.3%si, 0.0%st
      Cpu3 : 43.3%us, 12.2%sy, 0.0%ni, 24.8%id, 19.7%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu4 : 46.1%us, 15.2%sy, 0.0%ni, 22.9%id, 15.8%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu5 : 56.7%us, 12.2%sy, 0.0%ni, 29.4%id, 1.8%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu6 : 43.5%us, 19.6%sy, 1.5%ni, 0.0%id, 35.4%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu7 : 39.4%us, 21.2%sy, 2.4%ni, 21.8%id, 15.2%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu8 : 36.7%us, 31.3%sy, 2.7%ni, 7.2%id, 22.1%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu9 : 30.7%us, 33.3%sy, 1.8%ni, 0.6%id, 33.6%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu10 : 48.2%us, 21.0%sy, 0.6%ni, 27.5%id, 2.7%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu11 : 46.4%us, 22.9%sy, 0.3%ni, 1.5%id, 28.9%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu12 : 52.8%us, 15.2%sy, 0.3%ni, 24.8%id, 6.9%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu13 : 16.1%us, 60.0%sy, 0.6%ni, 23.0%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu14 : 37.1%us, 36.8%sy, 0.0%ni, 25.1%id, 0.9%wa, 0.0%hi, 0.0%si, 0.0%st
      Cpu15 : 42.7%us, 24.8%sy, 0.0%ni, 31.0%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st

      Just one core are working at 100%si :S

      Comment


      • #4
        I just confirmed, for some reason, is not balancing the assignment of irq and all end in core 0. This is not happening in the servers with CLN7 will be a kernel problem?

        i have this kernel:

        2.6.32-673.26.1.lve1.4.18.el6.x86_64 i

        Comment


        • #5
          I have found the problem. It is not installed irq balance, but when wanting to install it ....

          [~]# yum install irqbalance
          Loaded plugins: fastestmirror, protectbase, rhnplugin, universal-hooks
          Setting up Install Process
          Loading mirror speeds from cached hostfile
          * EA4: 74.50.120.123
          * cloudlinux-x86_64-server-6: ph-proxy.cl-mirror.net
          3 packages excluded due to repository protections
          Resolving Dependencies
          --> Running transaction check
          ---> Package irqbalance.x86_64 2:1.0.7-8.el6 will be installed
          --> Processing Dependency: libnuma.so.1(libnuma_1.1)(64bit) for package: 2:irqbalance-1.0.7-8.el6.x86_64
          --> Processing Dependency: libnuma.so.1()(64bit) for package: 2:irqbalance-1.0.7-8.el6.x86_64
          --> Running transaction check
          ---> Package numactl.x86_64 0:2.0.9-2.el6 will be installed
          --> Finished Dependency Resolution

          Dependencies Resolved

          ================================================== ================================================== ================================================== ==================
          Package Arch Version Repository Size
          ================================================== ================================================== ================================================== ==================
          Installing:
          irqbalance x86_64 2:1.0.7-8.el6 cloudlinux-x86_64-server-6 41 k
          Installing for dependencies:
          numactl x86_64 2.0.9-2.el6 cloudlinux-x86_64-server-6 73 k

          Transaction Summary
          ================================================== ================================================== ================================================== ==================
          Install 2 Package(s)

          Total download size: 114 k
          Installed size: 0
          Is this ok [y/N]: y
          Downloading Packages:

          Error Downloading Packages:
          numactl-2.0.9-2.el6.x86_64: failed to retrieve getPackage/numactl-2.0.9-2.el6.x86_64.rpm from cloudlinux-x86_64-server-6
          error was [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
          2:irqbalance-1.0.7-8.el6.x86_64: failed to retrieve getPackage/irqbalance-1.0.7-8.el6.x86_64.rpm from cloudlinux-x86_64-server-6
          error was [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"

          Comment


          • #6
            Try clearing yum cache then try again:

            Code:
            yum clean all

            Comment

            Working...
            X