Announcement

Collapse
No announcement yet.

Server got overloaded even with CL limiting things. How do I troubleshoot this?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Server got overloaded even with CL limiting things. How do I troubleshoot this?

    I've had user accounts that got hammered before, hit their limits, and then come back online after things cool down. I understand all that stuff and how the limits work. Today however I had something happen that has not happened to me since I switched over to a CL based server. I started getting reports that sites were down, one in particular after she had just sent out an email blast ten minutes earlier. Not sure if that was part of the problem but like I said ten minutes after she sent it out the server apparently got hammered badly. I couldn't get into WHM/cpanel and even when going through my VPS's console terminal I couldn't log in or do anything because it was so slow to respond. The server panel said that it was running with around 85% CPU usage and really high network usage as well.

    I wound up just power cycling the VPS and when it came back up all seemed to be okay. I had also turned on cloudflare's attack mode on the domain that sent out the email blast.

    Once I got back in I went to CL and looked to see if anyone had been hitting limits in the last 30 minutes and none of the accounts on the server seemed to be having any problems. Maybe a couple of faults here and there but overall nothing out of the ordinary.

    I'm hoping this was just a one time deal, but if it was you what would you do to troubleshoot this and find out what actually happened? If CL wasn't throttling anyone then why would the whole server lag out like that? I do not have MySQL Governor on this VPS if that matters.

    My thanks for any insight or ideas you might have for me to look at.

  • #2
    First of all I would like to ensure lvestats are running and collecting at least some data, please show the output of a simple `lveinfo` command.

    Also, what are the LVE limits on this VPS? What are the server specs and how many accounts are there? If you are lucky enough and can log into server during the overload please run `top` command and provide the header information which include uptime, load average, mem and cpu usage.

    Comment


    • #3
      There are about 50 accounts on the server. Server is an AMD Ryzen 9 7950X 16-Core Processor with 32 processors and 8gb of memory.

      I have several packages set up. There are duplicates that I removed that are custom packages but use the same limits.
      Special root 300% 3G 15MB/s 2048 75 180 -
      Basic root 100% 1G 1MB/s 1024 20 100 -
      Deluxe root 100% 1G 1MB/s 1024 20 100 -
      Enterprise root 200% 2G 10MB/s 1024 40 100 -


      Here is lveinfo:
      HTML Code:
      +----+----+----+----+---+---+---+-----+-----+-----+-----+---+------+------+-----+------+------+------+-----+------+------+------+------+-----+-----+-----+
      |ID  |aCPU|mCPU|lCPU|aEP|mEP|lEP|aVMem|mVMem|lVMem|VMemF|EPf|aPMem |mPMem |lPMem|aNproc|mNproc|lNproc|PMemF|NprocF|aIO   |mIO   |lIO   |aIOPS|mIOPS|lIOPS|
      +----+----+----+----+---+---+---+-----+-----+-----+-----+---+------+------+-----+------+------+------+-----+------+------+------+------+-----+-----+-----+
      |1001|0   |2   |150 |0  |0  |20 |0B   |0B   |0B   |0    |0  |1.6MB |15.6MB|1.0GB|0     |0     |100   |0    |0     |0B    |0B    |8.0MB |0    |0    |1.0K |
      |1003|1   |1   |200 |0  |0  |40 |0B   |0B   |0B   |0    |0  |16.0MB|16.0MB|2.0GB|0     |0     |100   |0    |0     |0B    |0B    |10.0MB|0    |0    |1.0K |
      |1004|0   |0   |150 |0  |0  |20 |0B   |0B   |0B   |0    |0  |22.5MB|74.9MB|1.0GB|0     |0     |100   |0    |0     |191B  |956B  |8.0MB |0    |0    |1.0K |
      |1005|0   |1   |100 |0  |0  |20 |0B   |0B   |0B   |0    |0  |8.4MB |44.8MB|1.0GB|0     |0     |100   |0    |0     |427B  |3.8KB |5.0MB |0    |0    |1.0K |
      |1006|0   |1   |100 |0  |0  |20 |0B   |0B   |0B   |0    |0  |2.9MB |15.2MB|1.0GB|0     |0     |100   |0    |0     |16.2KB|155KB |5.0MB |1    |11   |1.0K |
      |1008|0   |0   |100 |0  |0  |20 |0B   |0B   |0B   |0    |0  |2.4MB |24.1MB|1.0GB|0     |0     |100   |0    |0     |0B    |0B    |5.0MB |0    |0    |1.0K |
      |1009|0   |1   |100 |0  |0  |20 |0B   |0B   |0B   |0    |0  |1.1MB |11.5MB|1.0GB|0     |0     |100   |0    |0     |0B    |0B    |5.0MB |0    |0    |1.0K |
      |1010|0   |0   |150 |0  |0  |20 |0B   |0B   |0B   |0    |0  |4.7MB |47.2MB|1.0GB|0     |0     |100   |0    |0     |0B    |0B    |8.0MB |0    |0    |1.0K |
      |1012|0   |0   |200 |0  |0  |40 |0B   |0B   |0B   |0    |0  |410B  |4.0KB |2.0GB|0     |0     |100   |0    |0     |0B    |0B    |10.0MB|0    |0    |1.0K |
      |1013|1   |2   |100 |0  |0  |20 |0B   |0B   |0B   |0    |0  |13.9MB|17.5MB|1.0GB|0     |0     |100   |0    |0     |1.8KB |12.4KB|5.0MB |0    |0    |1.0K |
      +----+----+----+----+---+---+---+-----+-----+-----+-----+---+------+------+-----+------+------+------+-----+------+------+------+------+-----+-----+-----+​

      Comment


      • #4
        There limits are Ok, CPU seems good, however I would say server RAM could be the bottleneck - on short memory it could start aggressive swapping which will lead to high IO slowing down everyghing.

        Anyway, I would need more details what is going on with the server when issue is happening. We do have a smart monitoring script that records most necessary information under increased Load Average, please install it using this commands:

        Code:
        wget -O loadmonitoring.sh https://raw.githubusercontent.com/cloudlinux/tools/refs/heads/main/loadmonitoring/loadmonitoring.sh
        bash loadmonitoring.sh
        ​
        It will install the script itself, define the CPU threshold when to start collecting data (if load average jumps above 25% by default), configure cronjob to launch it every minute. The resulting files will be in /var/log/server-status/YYYY-MM-DD directory. As son as issue happens you would find the files there.

        Comment

        Working...
        X