net: sched: fix panic in rate estimators
authorEric Dumazet <edumazet@google.com>
Fri, 30 Jan 2015 01:30:12 +0000 (17:30 -0800)
committerDavid S. Miller <davem@davemloft.net>
Sun, 1 Feb 2015 01:49:37 +0000 (17:49 -0800)
Doing the following commands on a non idle network device
panics the box instantly, because cpu_bstats gets overwritten
by stats.

tc qdisc add dev eth0 root <your_favorite_qdisc>
... some traffic (one packet is enough) ...
tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc>

[  325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
[  325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
[  325.369158] PGD 1fa7067 PUD 0
[  325.372254] Oops: 0000 [#1] SMP
[  325.375514] Modules linked in: ...
[  325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
[  325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
[  325.419518] RIP: 0010:[<ffffffff81541c9e>]  [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
[  325.428506] RSP: 0018:ffff881ff2fa7928  EFLAGS: 00010286
[  325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
[  325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
[  325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
[  325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
[  325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
[  325.469536] FS:  00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
[  325.477630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
[  325.490510] Stack:
[  325.492524]  ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
[  325.499990]  ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
[  325.507460]  00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
[  325.514956] Call Trace:
[  325.517412]  [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230
[  325.523250]  [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60
[  325.529349]  [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590
[  325.535117]  [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240
[  325.540963]  [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20
[  325.546532]  [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0
[  325.552145]  [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40
[  325.557558]  [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220
[  325.563317]  [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0

Lets play safe and not use an union : percpu 'pointers' are mostly read
anyway, and we have typically few qdiscs per host.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe")
Signed-off-by: David S. Miller <davem@davemloft.net>
include/net/sch_generic.h

index 3d282cbb66bf1015d28f140ed3bc031ac43afaed..c605d305c577074d11bee6f19479dda8a4949ee3 100644 (file)
@@ -79,6 +79,9 @@ struct Qdisc {
        struct netdev_queue     *dev_queue;
 
        struct gnet_stats_rate_est64    rate_est;
+       struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+       struct gnet_stats_queue __percpu *cpu_qstats;
+
        struct Qdisc            *next_sched;
        struct sk_buff          *gso_skb;
        /*
@@ -86,15 +89,9 @@ struct Qdisc {
         */
        unsigned long           state;
        struct sk_buff_head     q;
-       union {
-               struct gnet_stats_basic_packed bstats;
-               struct gnet_stats_basic_cpu __percpu *cpu_bstats;
-       } __packed;
+       struct gnet_stats_basic_packed bstats;
        unsigned int            __state;
-       union {
-               struct gnet_stats_queue qstats;
-               struct gnet_stats_queue __percpu *cpu_qstats;
-       } __packed;
+       struct gnet_stats_queue qstats;
        struct rcu_head         rcu_head;
        int                     padded;
        atomic_t                refcnt;