March, 2012

9
Mar 12

Finally fixing bufferbloat

I got a new server for the flat to replace my old, and long dead, Mini-ITX machine. It’s setup for various networking duties, the most important of which is the network routing. Now that I’ve got a real machine doing the routing I’ve been able to do some traffic shaping to mitigate the bufferbloat problem in the flat – something I’ve been wanting to do for a long time now.

While the general routing stuff was pretty straightforward, the traffic shaping stuff in Linux scared me. Fortunately, Wonder Shaper exists and saves me trying to wrap my head around tc. It really is rather nice – simply alter a few variables and latency stays not-shit, even when the connection is under heavy load. Ah… :)

My ping times to some Google server on an empty connection are about 18ms. Once another machine saturates the uplink, by uploading a large file, that rises to around 320ms! With traffic shaping, they only rise by a few ms.

EDIT: Take note of Dave Täht’s comments below (he is a lot more knowledgeable than me on these matters), highlighting problems with Wonder Shaper.

For my own benefit as much as anything else, here is my “just got it working” routing script:

#!/bin/bash
#
# /etc/rc.d/rc.nat:
#

iptables -F
iptables -F -t nat
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

Now for the traffic shaping:

#!/bin/bash
#
# /etc/rc.d/rc.bufferbloat
#
# Wonder Shaper
# please read the README before filling out these values 
#
# Set the following values to somewhat less than your actual download
# and uplink speed. In kilobits. Also set the device that is to be shaped.

# Modem reports:
# DOWNLINK=21494
# UPLINK=2463

DOWNLINK=20000
UPLINK=2000

EIP="192.168.1.2/24"
IIP="10.0.0.1/24"
DEV=eth0

# low priority OUTGOING traffic - you can leave this blank if you want
# low priority source netmasks
NOPRIOHOSTSRC=

# low priority destination netmasks
NOPRIOHOSTDST=

# low priority source ports
NOPRIOPORTSRC=

# low priority destination ports
NOPRIOPORTDST=


# Now remove the following two lines :-)

#echo Please read the documentation in 'README' first
#exit

if [ "$1" = "status" ]
then
	tc -s qdisc ls dev $DEV
	tc -s class ls dev $DEV
	exit
fi


# clean existing down- and uplink qdiscs, hide errors
tc qdisc del dev $DEV root    2> /dev/null > /dev/null
tc qdisc del dev $DEV ingress 2> /dev/null > /dev/null

if [ "$1" = "stop" ] 
then 
	exit
fi


###### uplink

# install root HTB, point default traffic to 1:20:

tc qdisc add dev $DEV root handle 1: htb default 20

# shape everything at $UPLINK speed - this prevents huge queues in your
# DSL modem which destroy latency:

tc class add dev $DEV parent 1: classid 1:1 htb rate ${UPLINK}kbit burst 6k

# From server to internel network

tc class add dev $DEV parent 1:1 classid 1:5 htb rate 100000kbit \
   burst 6k prio 1

# high prio class 1:10:

tc class add dev $DEV parent 1:1 classid 1:10 htb rate ${UPLINK}kbit \
   burst 6k prio 1

# bulk & default class 1:20 - gets slightly less traffic, 
# and a lower priority:

tc class add dev $DEV parent 1:1 classid 1:20 htb rate $[9*$UPLINK/10]kbit \
   burst 6k prio 2

tc class add dev $DEV parent 1:1 classid 1:30 htb rate $[8*$UPLINK/10]kbit \
   burst 6k prio 2

# all get Stochastic Fairness:
tc qdisc add dev $DEV parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev $DEV parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev $DEV parent 1:30 handle 30: sfq perturb 10

# From server to internel network

tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \
      match ip dst $IIP flowid 1:5

# TOS Minimum Delay (ssh, NOT scp) in 1:10:

tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \
      match ip tos 0x10 0xfc flowid 1:10

# ICMP (ip protocol 1) in the interactive class 1:10 so we 
# can do measurements & impress our friends:
tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \
        match ip protocol 1 0xff flowid 1:10

# To speed up downloads while an upload is going on, put ACK packets in
# the interactive class:

tc filter add dev $DEV parent 1: protocol ip prio 10 u32 \
   match ip protocol 6 0xff \
   match u8 0x05 0x0f at 0 \
   match u16 0x0000 0xffc0 at 2 \
   match u8 0x10 0xff at 33 \
   flowid 1:10

# rest is 'non-interactive' ie 'bulk' and ends up in 1:20

# some traffic however suffers a worse fate
for a in $NOPRIOPORTDST
do
	tc filter add dev $DEV parent 1: protocol ip prio 14 u32 \
	   match ip dport $a 0xffff flowid 1:30
done

for a in $NOPRIOPORTSRC
do
 	tc filter add dev $DEV parent 1: protocol ip prio 15 u32 \
	   match ip sport $a 0xffff flowid 1:30
done

for a in $NOPRIOHOSTSRC
do
 	tc filter add dev $DEV parent 1: protocol ip prio 16 u32 \
	   match ip src $a flowid 1:30
done

for a in $NOPRIOHOSTDST
do
 	tc filter add dev $DEV parent 1: protocol ip prio 17 u32 \
	   match ip dst $a flowid 1:30
done

# rest is 'non-interactive' ie 'bulk' and ends up in 1:20

tc filter add dev $DEV parent 1: protocol ip prio 18 u32 \
   match ip dst 0.0.0.0/0 flowid 1:20


########## downlink #############
# slow downloads down to somewhat less than the real speed  to prevent 
# queuing at our ISP. Tune to see how high you can set it.
# ISPs tend to have *huge* queues to make sure big downloads are fast
#
# attach ingress policer:

tc qdisc add dev $DEV handle ffff: ingress

# filter everything _from_ the Internet, drop everything that's
# coming in too fast:

tc filter add dev $DEV parent ffff: protocol ip prio 50 u32 match ip dst \
   $EIP police rate ${DOWNLINK}kbit burst 10k drop flowid :1


# This reduces the queues in the driver buffer:
/sbin/ethtool -G eth0 tx 20
/sbin/ip link set dev eth0 qlen 4