tcp: reset reordering est. selectively on timeout
authorYuchung Cheng <ycheng@google.com>
Mon, 12 Aug 2013 23:41:25 +0000 (16:41 -0700)
committerDavid S. Miller <davem@davemloft.net>
Tue, 13 Aug 2013 23:08:33 +0000 (16:08 -0700)
On timeout the TCP sender unconditionally resets the estimated degree
of network reordering (tp->reordering). The idea behind this is that
the estimate is too large to trigger fast recovery (e.g., due to a IP
path change).

But for example if the sender only had 2 packets outstanding, then a
timeout doesn't tell much about reordering. A sender that learns about
reordering on big writes and loses packets on small writes will end up
falsely retransmitting again and again, especially when reordering is
more likely on big writes.

Therefore the sender should only suspect that tp->reordering is too
high if it could have gone into fast recovery with the (lower) default
estimate.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/tcp_input.c

index b61274b666f6556e9469dd06b6c84e88534fbc6f..e965cc7b87ffdd02c6260ab86c3febea6971a60b 100644 (file)
@@ -1877,8 +1877,13 @@ void tcp_enter_loss(struct sock *sk, int how)
        }
        tcp_verify_left_out(tp);
 
-       tp->reordering = min_t(unsigned int, tp->reordering,
-                              sysctl_tcp_reordering);
+       /* Timeout in disordered state after receiving substantial DUPACKs
+        * suggests that the degree of reordering is over-estimated.
+        */
+       if (icsk->icsk_ca_state <= TCP_CA_Disorder &&
+           tp->sacked_out >= sysctl_tcp_reordering)
+               tp->reordering = min_t(unsigned int, tp->reordering,
+                                      sysctl_tcp_reordering);
        tcp_set_ca_state(sk, TCP_CA_Loss);
        tp->high_seq = tp->snd_nxt;
        TCP_ECN_queue_cwr(tp);