Performance bottlenecks of Linux traffic shaping (tc)












2














I have an interesting problem.



We have a Linux server with Debian Stretch installed. The server has a single CPU E5-2680 v2 and Intel 82599ES 10G Dual Port NIC's. There are no other services (like HTTP and so on) running on it.



All seems good, until I enable traffic shaping. The shaping rules consist of htb qdisc on the bond0 and bond1 interface.



There is a three step hashing filter:





  1. Four rule filter to select a large subnet. The rules look like this:



    protocol ip u32 ht 800:: match ip src 10.15.0.0/18 hashkey mask 0x0000ff00 at 12 link 1:



  2. 60 or so rule tables to select a /24 out of that large subnet



    protocol ip u32 ht 1:10 match ip src 10.15.16.0/24 hashkey mask 0x000000ff at 12 link 100:



  3. 256 rule tables to match the IP



    parent 1:0 prio 100 protocol ip u32 ht 100:20: match ip src 10.15.16.32 flowid 1:20    
    class add dev bond0 parent 1:0 classid 1:1 htb rate 24Gbit ceil 24Gbit burst 4096k cburst 256k
    class add dev bond0 parent 1:1 classid 1:20 htb rate 102400Kbit burst 4096k cburst 256k
    qdisc add dev bond0 parent 1:20 pfifo_fast



Now, when I enable these rules, I get at most around 10Gbps total traffic and then ping starts to go up to 100ms or more. However, during that time, the CPU load is almost none - uptime shows a load of 0.07 and each CPU core is only about 5% utilized. ping instantly becomes OK once I clear the shaping rules.



Previously I used sfq instead of pfifo_fast, and that did result in a large CPU load. I then switched to red, which produced a bit better results with no CPU load, then I switched to pfifo_fast which produces almost the same results as red.




  • What other bottleneck could be there?

  • Has anyone tried using Linux traffic shaper for a network that's over 10gbps?










share|improve this question
























  • Please add the brand and model of your NICs.
    – Rui F Ribeiro
    Dec 21 '17 at 23:37












  • @RuiFRibeiro added
    – Pentium100
    Dec 22 '17 at 4:32










  • FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
    – Andrew Henle
    Feb 8 '18 at 12:47


















2














I have an interesting problem.



We have a Linux server with Debian Stretch installed. The server has a single CPU E5-2680 v2 and Intel 82599ES 10G Dual Port NIC's. There are no other services (like HTTP and so on) running on it.



All seems good, until I enable traffic shaping. The shaping rules consist of htb qdisc on the bond0 and bond1 interface.



There is a three step hashing filter:





  1. Four rule filter to select a large subnet. The rules look like this:



    protocol ip u32 ht 800:: match ip src 10.15.0.0/18 hashkey mask 0x0000ff00 at 12 link 1:



  2. 60 or so rule tables to select a /24 out of that large subnet



    protocol ip u32 ht 1:10 match ip src 10.15.16.0/24 hashkey mask 0x000000ff at 12 link 100:



  3. 256 rule tables to match the IP



    parent 1:0 prio 100 protocol ip u32 ht 100:20: match ip src 10.15.16.32 flowid 1:20    
    class add dev bond0 parent 1:0 classid 1:1 htb rate 24Gbit ceil 24Gbit burst 4096k cburst 256k
    class add dev bond0 parent 1:1 classid 1:20 htb rate 102400Kbit burst 4096k cburst 256k
    qdisc add dev bond0 parent 1:20 pfifo_fast



Now, when I enable these rules, I get at most around 10Gbps total traffic and then ping starts to go up to 100ms or more. However, during that time, the CPU load is almost none - uptime shows a load of 0.07 and each CPU core is only about 5% utilized. ping instantly becomes OK once I clear the shaping rules.



Previously I used sfq instead of pfifo_fast, and that did result in a large CPU load. I then switched to red, which produced a bit better results with no CPU load, then I switched to pfifo_fast which produces almost the same results as red.




  • What other bottleneck could be there?

  • Has anyone tried using Linux traffic shaper for a network that's over 10gbps?










share|improve this question
























  • Please add the brand and model of your NICs.
    – Rui F Ribeiro
    Dec 21 '17 at 23:37












  • @RuiFRibeiro added
    – Pentium100
    Dec 22 '17 at 4:32










  • FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
    – Andrew Henle
    Feb 8 '18 at 12:47
















2












2








2


1





I have an interesting problem.



We have a Linux server with Debian Stretch installed. The server has a single CPU E5-2680 v2 and Intel 82599ES 10G Dual Port NIC's. There are no other services (like HTTP and so on) running on it.



All seems good, until I enable traffic shaping. The shaping rules consist of htb qdisc on the bond0 and bond1 interface.



There is a three step hashing filter:





  1. Four rule filter to select a large subnet. The rules look like this:



    protocol ip u32 ht 800:: match ip src 10.15.0.0/18 hashkey mask 0x0000ff00 at 12 link 1:



  2. 60 or so rule tables to select a /24 out of that large subnet



    protocol ip u32 ht 1:10 match ip src 10.15.16.0/24 hashkey mask 0x000000ff at 12 link 100:



  3. 256 rule tables to match the IP



    parent 1:0 prio 100 protocol ip u32 ht 100:20: match ip src 10.15.16.32 flowid 1:20    
    class add dev bond0 parent 1:0 classid 1:1 htb rate 24Gbit ceil 24Gbit burst 4096k cburst 256k
    class add dev bond0 parent 1:1 classid 1:20 htb rate 102400Kbit burst 4096k cburst 256k
    qdisc add dev bond0 parent 1:20 pfifo_fast



Now, when I enable these rules, I get at most around 10Gbps total traffic and then ping starts to go up to 100ms or more. However, during that time, the CPU load is almost none - uptime shows a load of 0.07 and each CPU core is only about 5% utilized. ping instantly becomes OK once I clear the shaping rules.



Previously I used sfq instead of pfifo_fast, and that did result in a large CPU load. I then switched to red, which produced a bit better results with no CPU load, then I switched to pfifo_fast which produces almost the same results as red.




  • What other bottleneck could be there?

  • Has anyone tried using Linux traffic shaper for a network that's over 10gbps?










share|improve this question















I have an interesting problem.



We have a Linux server with Debian Stretch installed. The server has a single CPU E5-2680 v2 and Intel 82599ES 10G Dual Port NIC's. There are no other services (like HTTP and so on) running on it.



All seems good, until I enable traffic shaping. The shaping rules consist of htb qdisc on the bond0 and bond1 interface.



There is a three step hashing filter:





  1. Four rule filter to select a large subnet. The rules look like this:



    protocol ip u32 ht 800:: match ip src 10.15.0.0/18 hashkey mask 0x0000ff00 at 12 link 1:



  2. 60 or so rule tables to select a /24 out of that large subnet



    protocol ip u32 ht 1:10 match ip src 10.15.16.0/24 hashkey mask 0x000000ff at 12 link 100:



  3. 256 rule tables to match the IP



    parent 1:0 prio 100 protocol ip u32 ht 100:20: match ip src 10.15.16.32 flowid 1:20    
    class add dev bond0 parent 1:0 classid 1:1 htb rate 24Gbit ceil 24Gbit burst 4096k cburst 256k
    class add dev bond0 parent 1:1 classid 1:20 htb rate 102400Kbit burst 4096k cburst 256k
    qdisc add dev bond0 parent 1:20 pfifo_fast



Now, when I enable these rules, I get at most around 10Gbps total traffic and then ping starts to go up to 100ms or more. However, during that time, the CPU load is almost none - uptime shows a load of 0.07 and each CPU core is only about 5% utilized. ping instantly becomes OK once I clear the shaping rules.



Previously I used sfq instead of pfifo_fast, and that did result in a large CPU load. I then switched to red, which produced a bit better results with no CPU load, then I switched to pfifo_fast which produces almost the same results as red.




  • What other bottleneck could be there?

  • Has anyone tried using Linux traffic shaper for a network that's over 10gbps?







linux networking performance tc






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 7 '18 at 11:31









Yaron

3,27421027




3,27421027










asked Dec 21 '17 at 21:11









Pentium100Pentium100

163128




163128












  • Please add the brand and model of your NICs.
    – Rui F Ribeiro
    Dec 21 '17 at 23:37












  • @RuiFRibeiro added
    – Pentium100
    Dec 22 '17 at 4:32










  • FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
    – Andrew Henle
    Feb 8 '18 at 12:47




















  • Please add the brand and model of your NICs.
    – Rui F Ribeiro
    Dec 21 '17 at 23:37












  • @RuiFRibeiro added
    – Pentium100
    Dec 22 '17 at 4:32










  • FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
    – Andrew Henle
    Feb 8 '18 at 12:47


















Please add the brand and model of your NICs.
– Rui F Ribeiro
Dec 21 '17 at 23:37






Please add the brand and model of your NICs.
– Rui F Ribeiro
Dec 21 '17 at 23:37














@RuiFRibeiro added
– Pentium100
Dec 22 '17 at 4:32




@RuiFRibeiro added
– Pentium100
Dec 22 '17 at 4:32












FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
– Andrew Henle
Feb 8 '18 at 12:47






FWIW, I had similar performance problems using tc to emulate a WAN. I don't remember the exact details as it was a few years ago, but I was only using 100baseT ethernet as that was about the bandwidth on the WAN connection I was emulating, so I was hoping I could just use tc to add latency. And I still had performance problems similar to what you're describing when I started increasing the simulated latency, including severe bandwidth limitations well beneath what the underlying hardware supported without tc shaping.
– Andrew Henle
Feb 8 '18 at 12:47












1 Answer
1






active

oldest

votes


















0














Did you find a way to get around the problem?






share|improve this answer








New contributor




mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f412365%2fperformance-bottlenecks-of-linux-traffic-shaping-tc%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Did you find a way to get around the problem?






    share|improve this answer








    New contributor




    mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      0














      Did you find a way to get around the problem?






      share|improve this answer








      New contributor




      mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





















        0












        0








        0






        Did you find a way to get around the problem?






        share|improve this answer








        New contributor




        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        Did you find a way to get around the problem?







        share|improve this answer








        New contributor




        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 30 mins ago









        mctailermctailer

        1




        1




        New contributor




        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        mctailer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f412365%2fperformance-bottlenecks-of-linux-traffic-shaping-tc%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            濃尾地震

            How to rewrite equation of hyperbola in standard form

            No ethernet ip address in my vocore2