GitHub/LineageOS/android_kernel_motorola_exynos9610.git
7 years agoMerge branch 'bpf-xdp-Report-bpf_prog-ID-in-IFLA_XDP'
David S. Miller [Fri, 16 Jun 2017 15:58:38 +0000 (11:58 -0400)]
Merge branch 'bpf-xdp-Report-bpf_prog-ID-in-IFLA_XDP'

Martin KaFai Lau says:

====================
bpf: xdp: Report bpf_prog ID in IFLA_XDP

This is the first usage of the new bpf_prog ID.  It is for
reporting the ID of a xdp_prog through netlink.

It rides on the existing IFLA_XDP.  This patch adds IFLA_XDP_PROG_ID
for the bpf_prog ID reporting.

It starts with changing the generic_xdp first.  After that,
the hardware driver is changed one by one.  Jakub Kicinski mentioned
that he will soon introduce XDP_ATTACHED_HW (on top of the existing
XDP_ATTACHED_DRV and XDP_ATTACHED_SKB)
and he is going to reuse the prog_attached for this purpose.
Hence, this patch set keeps the prog_attached even though
!!prog_id also implies there is xdp_prog attached.

I have tested with generic_xdp, mlx4 and mlx5.

v3:
1. Replace 'if' by '?' when checking the xdp_prog pointer
   as suggested by Jakub Kicinski (thanks!)

v2:
1. Remove READ_ONCE since it is alredy under rtnl lock
2. Keep prog_attached in 'struct netdev_xdp' as
   requested by Jakub Kicinski.  The existing prog_attached
   and the new prog_id are put under a struct for XDP_QUERY_PROG.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: qede: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:17 +0000 (17:29 -0700)]
bpf: qede: Report bpf_prog ID during XDP_QUERY_PROG

Add support to qede to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Mintz Yuval <Yuval.Mintz@cavium.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: nfp: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:16 +0000 (17:29 -0700)]
bpf: nfp: Report bpf_prog ID during XDP_QUERY_PROG

Add support to nfp to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: ixgbe: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:15 +0000 (17:29 -0700)]
bpf: ixgbe: Report bpf_prog ID during XDP_QUERY_PROG

Add support to ixgbe to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: thunderx: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:14 +0000 (17:29 -0700)]
bpf: thunderx: Report bpf_prog ID during XDP_QUERY_PROG

Add support to thunderx to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Sunil Goutham <sgoutham@cavium.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: bnxt: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:13 +0000 (17:29 -0700)]
bpf: bnxt: Report bpf_prog ID during XDP_QUERY_PROG

Add support to bnxt to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:12 +0000 (17:29 -0700)]
bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROG

Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:11 +0000 (17:29 -0700)]
bpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROG

Add support to mlx5e to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:10 +0000 (17:29 -0700)]
bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG

Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Add IFLA_XDP_PROG_ID
Martin KaFai Lau [Fri, 16 Jun 2017 00:29:09 +0000 (17:29 -0700)]
net: Add IFLA_XDP_PROG_ID

Expose prog_id through IFLA_XDP_PROG_ID.  This patch
makes modification to generic_xdp.  The later patches will
modify other xdp-supported drivers.

prog_id is added to struct net_dev_xdp.

iproute2 patch will be followed. Here is how the 'ip link'
will look like:
> ip link show eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'skb-accessor-cleanups'
David S. Miller [Fri, 16 Jun 2017 15:48:41 +0000 (11:48 -0400)]
Merge branch 'skb-accessor-cleanups'

Johannes Berg says:

====================
skb data accessors cleanup

Over night, Fengguang's bot told me that it compiled all of its many
various configurations successfully, and I had done allyesconfig on
x86_64 myself yesterday to iron out the things I missed.

So now I think I'm happy with it.

My tree was based on your

    commit 3715c47bcda8bb56f7e2be27276282a2d0d48c09
    Merge: 18b6e7955d8f d8fbd27469fc
    Author: David S. Miller <davem@davemloft.net>
    Date:   Thu Jun 15 14:31:56 2017 -0400

        Merge branch 'r8152-support-new-chips'

when the compilation tests happened, but I've reviewed the changes
coming into net-next in the meantime and didn't see any new usages
of skb data accessors having come in.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: add and use skb_put_u8()
Johannes Berg [Fri, 16 Jun 2017 12:29:24 +0000 (14:29 +0200)]
networking: add and use skb_put_u8()

Joe and Bjørn suggested that it'd be nicer to not have the
cast in the fairly common case of doing
*(u8 *)skb_put(skb, 1) = c;

Add skb_put_u8() for this case, and use it across the code,
using the following spatch:

    @@
    expression SKB, C, S;
    typedef u8;
    identifier fn = {skb_put};
    fresh identifier fn2 = fn ## "_u8";
    @@
    - *(u8 *)fn(SKB, S) = C;
    + fn2(SKB, C);

Note that due to the "S", the spatch isn't perfect, it should
have checked that S is 1, but there's also places that use a
sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
out that nobody ever did something like
*(u8 *)skb_put(skb, 2) = c;

which would be wrong anyway since the second byte wouldn't be
initialized.

Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: make skb_push & __skb_push return void pointers
Johannes Berg [Fri, 16 Jun 2017 12:29:23 +0000 (14:29 +0200)]
networking: make skb_push & __skb_push return void pointers

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: make skb_pull & friends return void pointers
Johannes Berg [Fri, 16 Jun 2017 12:29:22 +0000 (14:29 +0200)]
networking: make skb_pull & friends return void pointers

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = {
            skb_pull,
            __skb_pull,
            skb_pull_inline,
            __pskb_pull_tail,
            __pskb_pull,
            pskb_pull
    };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: make skb_put & friends return void pointers
Johannes Berg [Fri, 16 Jun 2017 12:29:21 +0000 (14:29 +0200)]
networking: make skb_put & friends return void pointers

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: introduce and use skb_put_data()
Johannes Berg [Fri, 16 Jun 2017 12:29:20 +0000 (14:29 +0200)]
networking: introduce and use skb_put_data()

A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.

An spatch similar to the one for skb_put_zero() converts many
of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

(again, manually post-processed to retain some comments)

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetworking: convert many more places to skb_put_zero()
Johannes Berg [Fri, 16 Jun 2017 12:29:19 +0000 (14:29 +0200)]
networking: convert many more places to skb_put_zero()

There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'r8152-adjust-runtime-suspend-resume'
David S. Miller [Fri, 16 Jun 2017 15:37:13 +0000 (11:37 -0400)]
Merge branch 'r8152-adjust-runtime-suspend-resume'

Hayes Wang says:

====================
r8152: adjust runtime suspend/resume

v2:
For #1, replace GFP_KERNEL with GFP_NOIO for usb_submit_urb().

v1:
Improve the flow about runtime suspend/resume and make the code
easy to read.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: move calling delay_autosuspend function
hayeswang [Tue, 13 Jun 2017 07:14:40 +0000 (15:14 +0800)]
r8152: move calling delay_autosuspend function

Move calling delay_autosuspend() in rtl8152_runtime_suspend(). Calling
delay_autosuspend() as late as possible.

The original flows are
   1. check if the driver/device is busy now.
   2. set wake events.
   3. enter runtime suspend.

If the wake event occurs between (1) and (2), the device may miss it. Besides,
to avoid the runtime resume occurs after runtime suspend immediately, move the
checking to the end of rtl8152_runtime_suspend().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: split rtl8152_resume function
hayeswang [Tue, 13 Jun 2017 07:14:39 +0000 (15:14 +0800)]
r8152: split rtl8152_resume function

Split rtl8152_resume() into rtl8152_runtime_resume() and
rtl8152_system_resume().

Besides, replace GFP_KERNEL with GFP_NOIO for usb_submit_urb().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotls: Depend upon INET not plain NET.
David S. Miller [Fri, 16 Jun 2017 15:28:49 +0000 (11:28 -0400)]
tls: Depend upon INET not plain NET.

We refer to TCP et al. symbols so have to use INET as
the dependency.

   ERROR: "tcp_prot" [net/tls/tls.ko] undefined!
>> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined!
   ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined!
   ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined!
   ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined!

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx4-XDP-performance-improvements'
David S. Miller [Fri, 16 Jun 2017 02:53:24 +0000 (22:53 -0400)]
Merge branch 'mlx4-XDP-performance-improvements'

Tariq Toukan says:

====================
mlx4 XDP performance improvements

This patchset contains data-path improvements, mainly for XDP_DROP
and XDP_TX cases.

Main patches:
* Patch 2 by Saeed allows enabling optimized A0 RX steering (in HW) when
  setting a single RX ring.
  With this configuration, HW packet-rate dramatically improves,
  reaching 28.1 Mpps in XDP_DROP case for both IPv4 (37% gain)
  and IPv6 (53% gain).
* Patch 6 enhances the XDP xmit function. Among other changes, now we
  ring one doorbell per NAPI. Patch gives 17% gain in XDP_TX case.
* Patch 7 obsoletes the NAPI of XDP_TX completion queue and integrates its
  poll into the respective RX NAPI. Patch gives 15% gain in XDP_TX case.

Series generated against net-next commit:
f7aec129a356 rxrpc: Cache the congestion window setting
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Refactor mlx4_en_free_tx_desc
Tariq Toukan [Thu, 15 Jun 2017 11:35:40 +0000 (14:35 +0300)]
net/mlx4_en: Refactor mlx4_en_free_tx_desc

Some code re-ordering, functionally equivalent.

- The !tx_info->inl check is evaluated anyway in both flows
  (common case/end case). Run it first, this might finish
  the flows earlier.
- dma_unmap calls are identical in both flows, get it out
  of the if block into the common area.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Replace TXBB_SIZE multiplications with shift operations
Tariq Toukan [Thu, 15 Jun 2017 11:35:39 +0000 (14:35 +0300)]
net/mlx4_en: Replace TXBB_SIZE multiplications with shift operations

Define LOG_TXBB_SIZE, log of TXBB_SIZE, and use it with a shift
operation instead of a multiplication with TXBB_SIZE.
Operations are equivalent as TXBB_SIZE is a power of two.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Increase default TX ring size
Tariq Toukan [Thu, 15 Jun 2017 11:35:38 +0000 (14:35 +0300)]
net/mlx4_en: Increase default TX ring size

Increase the default TX ring size (from 512 to 1024) to match
the RX ring size.
This gives the XDP TX ring a better chance to keep up with the
rate of its RX ring in case of a high load of XDP_TX actions.

Tested:
Ethtool counter rx_xdp_tx_full used to increase, after applying this
patch it stopped.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Poll XDP TX completion queue in RX NAPI
Tariq Toukan [Thu, 15 Jun 2017 11:35:37 +0000 (14:35 +0300)]
net/mlx4_en: Poll XDP TX completion queue in RX NAPI

Instead of having their own NAPIs, XDP TX completion queues get
polled within the corresponding RX NAPI.
This prevents any possible race on TX ring prod/cons indices,
between the context that issues the transmits (RX NAPI) and the
context that handles the completions (was previously done in
a separate NAPI).

This also improves performance, as it decreases the number
of NAPIs running on a CPU, saving the overhead of syncing
and switching between the contexts.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 12.0 Mpps | 13.8 Mpps |  15% |
IPv6 | 12.0 Mpps | 13.8 Mpps |  15% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Improve XDP xmit function
Tariq Toukan [Thu, 15 Jun 2017 11:35:36 +0000 (14:35 +0300)]
net/mlx4_en: Improve XDP xmit function

Several performance improvements in XDP TX datapath,
including:
- Ring a single doorbell for XDP TX ring per NAPI budget,
  instead of doing it per a lower threshold (was 8).
  This includes removing the flow of immediate doorbell ringing
  in case of a full TX ring.
- Compiler branch predictor hints.
- Calculate values in compile time rather than in runtime.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 10.3 Mpps | 12.0 Mpps |  17% |
IPv6 | 10.3 Mpps | 12.0 Mpps |  17% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Improve stack xmit function
Tariq Toukan [Thu, 15 Jun 2017 11:35:35 +0000 (14:35 +0300)]
net/mlx4_en: Improve stack xmit function

Several small code and performance improvements in stack TX datapath,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- Move tx_info non-inline flow handling to a separate function.
- Calculate data_offset in compile time rather than in runtime
  (for !lso_header_size branch).
- Avoid trinary-operator ("?") when value can be preset in a matching
  branch.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Improve transmit CQ polling
Tariq Toukan [Thu, 15 Jun 2017 11:35:34 +0000 (14:35 +0300)]
net/mlx4_en: Improve transmit CQ polling

Several small performance improvements in TX CQ polling,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- More proper check of cq type.
- Use boolean instead of int for a binary indication.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet-rate tests for both regular stack and XDP use cases:
No noticeable gain, no degradation.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Improve receive data-path
Tariq Toukan [Thu, 15 Jun 2017 11:35:33 +0000 (14:35 +0300)]
net/mlx4_en: Improve receive data-path

Several small performance improvements in RX datapath,
including:
- Compiler branch predictor hints.
- Replace a multiplication with a shift operation.
- Minimize variables scope.
- Write-prefetch for packet header.
- Avoid trinary-operator ("?") when value can be preset in a matching
  branch.
- Save a branch by updating RX ring doorbell within
  mlx4_en_refill_rx_buffers(), which now returns void.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON
(enable by ethtool -L <interface> rx 1).

XDP_DROP packet rate:
Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%).

Drop packets in TC:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 4.14 Mpps | 4.18 Mpps |   1% |
-------------------------------------

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 10.1 Mpps | 10.3 Mpps |   2% |
IPv6 | 10.1 Mpps | 10.3 Mpps |   2% |
-------------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Optimized single ring steering
Saeed Mahameed [Thu, 15 Jun 2017 11:35:32 +0000 (14:35 +0300)]
net/mlx4_en: Optimized single ring steering

Avoid touching RX QP RSS context when loading with only
one RX ring, to allow optimized A0 RX steering.

Enable by:
- loading mlx4_core with module param: log_num_mgm_entry_size = -6.
- then: ethtool -L <interface> rx 1

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_DROP packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 20.5 Mpps | 28.1 Mpps |  37% |
IPv6 | 18.4 Mpps | 28.1 Mpps |  53% |
-------------------------------------

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Remove unused argument in TX datapath function
Tariq Toukan [Thu, 15 Jun 2017 11:35:31 +0000 (14:35 +0300)]
net/mlx4_en: Remove unused argument in TX datapath function

Remove owner argument, as it is obsolete and unused.
This also saves the overhead of calculating its value in data-path.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: solos-pci: remove useless variable assignments
Gustavo A. R. Silva [Thu, 15 Jun 2017 19:56:21 +0000 (14:56 -0500)]
atm: solos-pci: remove useless variable assignments

Value assigned to variable _data32_ at lines 1254 and 1257 is
overwritten at line 1260 before it can be used. This makes
such variable assignments useless.

Addresses-Coverity-ID: 1227049
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: assign default CPU port to all ports
Vivien Didelot [Thu, 15 Jun 2017 19:06:54 +0000 (15:06 -0400)]
net: dsa: assign default CPU port to all ports

The current code only assigns the default cpu_dp to all user ports of
the switch to which the CPU port belongs. The user ports of the other
switches of the fabric thus don't have a default CPU port.

This patch fixes this by assigning the cpu_dp of all user ports of all
switches of the fabric when the tree is fully parsed.

Fixes: a29342e73911 ("net: dsa: Associate slave network device with CPU port")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'r8152-support-new-chips'
David S. Miller [Thu, 15 Jun 2017 18:31:56 +0000 (14:31 -0400)]
Merge branch 'r8152-support-new-chips'

Hayes Wang says:

====================
r8152: support new chips

These patches are used to support new chips.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: add byte_enable for ocp_read_word function
hayeswang [Thu, 15 Jun 2017 06:44:04 +0000 (14:44 +0800)]
r8152: add byte_enable for ocp_read_word function

Add byte_enable for ocp_read_word() to replace reading 4
bytes data with reading the desired 2 bytes data.

This is used to avoid the issue which is described in
commit b4d99def0938 ("r8152: remove sram_read"). The
original method always reads 4 bytes data, and it may
have problem when reading the PHY registers.

The new method is supported since RTL8153B, but it
doesn't influence the previous chips. The bits of the
byte_enable for the previous chips are the reserved
bits, and the hw would ignore them.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: support RTL8153B
hayeswang [Thu, 15 Jun 2017 06:44:03 +0000 (14:44 +0800)]
r8152: support RTL8153B

This patch supports two new chips for RTL8153B.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: support new chip 8050
hayeswang [Thu, 15 Jun 2017 06:44:02 +0000 (14:44 +0800)]
r8152: support new chip 8050

The settings of the new chip are the same with RTL8152, except that
its product ID is 0x8050.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ibmvnic-LPM-bug-fixes'
David S. Miller [Thu, 15 Jun 2017 18:29:01 +0000 (14:29 -0400)]
Merge branch 'ibmvnic-LPM-bug-fixes'

Thomas Falcon says:

====================
ibmvnic: LPM bug fixes

This series of small patches is meant to resolve a number of
bugs, mostly occurring during an ibmvnic driver reset when
recovering from a logical partition migration (LPM).

The first patch ensures that RX buffer pools are properly
activated following an adapter reset by setting the proper
flag in the pool data structure.

The second patch uses netif_tx_disable to stop TX queues when
closing the device during a reset.

Third, fixup a typo that resulted in partial sanitization of
TX/RX descriptor queues following a device reset.

Fourth, remove an ambiguous conditional check that was resulting
in a kernel panic as null RX/TX completion descriptors were being
processed during napi polling while the device is closing.

Finally, fix a condition where the napi polling routine exits
before it has completed its work budget without notifying the
upper network layers. This omission could result in the
napi_disable function sleeping indefinitely under certain conditions.

v2: Attempt to provide a proper cover letter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Exit polling routine correctly during adapter reset
Thomas Falcon [Thu, 15 Jun 2017 04:50:09 +0000 (23:50 -0500)]
ibmvnic: Exit polling routine correctly during adapter reset

This patch fixes a bug where, in the case of a device reset,
the polling routine will never complete, causing napi_disable
to sleep indefinitely when attempting to close the device.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Remove VNIC_CLOSING check from pending_scrq
Thomas Falcon [Thu, 15 Jun 2017 04:50:08 +0000 (23:50 -0500)]
ibmvnic: Remove VNIC_CLOSING check from pending_scrq

Fix a kernel panic resulting from data access of a NULL
pointer during device close. The pending_scrq routine is
meant to determine whether there is a valid sub-CRQ message
awaiting processing. When the device is closing, however,
there is a possibility that NULL messages can be processed
because pending_scrq will always return 1 even if there
no valid message in the queue.

It's not clear what this closing state check was originally
meant to accomplish, so just remove it.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Sanitize entire SCRQ buffer on reset
Thomas Falcon [Thu, 15 Jun 2017 04:50:07 +0000 (23:50 -0500)]
ibmvnic: Sanitize entire SCRQ buffer on reset

Fixup a typo so that the entire SCRQ buffer is cleaned.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Ensure that TX queues are disabled in __ibmvnic_close
Thomas Falcon [Thu, 15 Jun 2017 04:50:06 +0000 (23:50 -0500)]
ibmvnic: Ensure that TX queues are disabled in __ibmvnic_close

Use netif_tx_disable to guarantee that TX queues are disabled
when __ibmvnic_close is called by the device reset routine.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Activate disabled RX buffer pools on reset
Thomas Falcon [Thu, 15 Jun 2017 04:50:05 +0000 (23:50 -0500)]
ibmvnic: Activate disabled RX buffer pools on reset

RX buffer pools are disabled while awaiting a device
reset if firmware indicates that the resource is closed.

This patch fixes a bug where pools were not being
subsequently enabled after the device reset, causing
the device to become inoperable.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosunvnet: restrict advertized checksum offloads to just IP
Shannon Nelson [Wed, 14 Jun 2017 22:43:37 +0000 (15:43 -0700)]
sunvnet: restrict advertized checksum offloads to just IP

As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well.  This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.

Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.

Orabug: 2617539126259755

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sched-act_tunnel_key-UDP-checksusm'
David S. Miller [Thu, 15 Jun 2017 18:21:04 +0000 (14:21 -0400)]
Merge branch 'sched-act_tunnel_key-UDP-checksusm'

Jiri Benc says:

====================
net: sched: act_tunnel_key: UDP checksums

Currently, the tunnel_key tc action does not set TUNNEL_CSUM, thus
transmitting packets with zero UDP checksum. This is inconsistent with how
we treat non-lwt UDP tunnels where the default is to fill in the UDP
checksum. Non-zero UDP checksum is the better default anyway for various
reasons previously discussed.

Make this configurable for the tunnel_key tc action with the default being
non-zero checksum. Saves a lot of surprises especially with IPv6.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: act_tunnel_key: make UDP checksum configurable
Jiri Benc [Wed, 14 Jun 2017 19:19:31 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: make UDP checksum configurable

Allow requesting of zero UDP checksum for encapsulated packets. The name and
meaning of the attribute is "NO_CSUM" in order to have the same meaning of
the attribute missing and being 0.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: act_tunnel_key: request UDP checksum by default
Jiri Benc [Wed, 14 Jun 2017 19:19:30 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: request UDP checksum by default

There's currently no way to request (outer) UDP checksum with
act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
action with IPv6 does not work without going through hassles: both sides
have to have udp6zerocsumrx configured on the tunnel interface. This is
obviously not a good solution universally.

It makes more sense to compute the UDP checksum by default even for IPv4.
Just set the default to request the checksum when using act_tunnel_key.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: s2io: remove useless variable in fill_rx_buffers
Gustavo A. R. Silva [Thu, 15 Jun 2017 02:58:17 +0000 (21:58 -0500)]
net: s2io: remove useless variable in fill_rx_buffers

Remove useless variable rxd_index and code related.

Addresses-Coverity-ID: 1397691
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'dsa-prefix-Global-macros'
David S. Miller [Thu, 15 Jun 2017 18:07:51 +0000 (14:07 -0400)]
Merge branch 'dsa-prefix-Global-macros'

Vivien Didelot says:

====================
net: dsa: prefix Global macros

This patch series is the 2/3 step of the register definitions cleanup.
It brings no functional changes.

It prefixes and documents all Global (1) registers with MV88E6XXX_G1_
(or a specific model like MV88E6352_G1_STS_PPU_STATE), and prefers a
16-bit hexadecimal representation of the Marvell registers layout.

The next and last patchset will prefix the Global 2 registers.
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Prio and Tag macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:06 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Prio and Tag macros

Prefix and document the remaining Global IP and IEEE Priority and Core
Tag Type registers and give them a clear 16-bit register representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Stats macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:05 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Stats macros

Prefix and document the Global Stats Operation and Counter registers and
give them a clear 16-bit registers representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Monitor Control macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:04 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Monitor Control macros

Prefix and document the Global Monitor Control Register macros
(which became the Global Monitor & MGMT Control Register with 88E6390)
and give a clear 16-bit registers representation.

Use __bf_shf to get the shift value at compile time instead of adding
new defined macros for it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Control macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:03 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Control macros

Prefix and document the Global Control and Control 2 registers macros
and give a clear 16-bit registers representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global VTU macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:02 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global VTU macros

Prefix and document the Global VTU registers macros and give a clear
16-bit registers representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global ATU macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:01 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global ATU macros

Prefix and document the Global ATU Registers macros and give clear
16-bit registers representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Switch MAC macros
Vivien Didelot [Thu, 15 Jun 2017 16:14:00 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Switch MAC macros

Prefix and document the Global Switch MAC Address Register macros and
give clear 16-bit register representation.

At the same time, move mv88e6xxx_g1_set_switch_mac in global1.c, where
it belongs.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: mv88e6xxx: prefix Global Status macros
Vivien Didelot [Thu, 15 Jun 2017 16:13:59 +0000 (12:13 -0400)]
net: dsa: mv88e6xxx: prefix Global Status macros

Prefix and document the Global Status Register macros and give clear
16-bit register representation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoskbuff: make skb_put_zero() return void
Johannes Berg [Wed, 14 Jun 2017 20:17:20 +0000 (22:17 +0200)]
skbuff: make skb_put_zero() return void

It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-ktls'
David S. Miller [Thu, 15 Jun 2017 16:12:41 +0000 (12:12 -0400)]
Merge branch 'net-ktls'

Dave Watson says:

====================
net: kernel TLS

This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing.  The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.

We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master

It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool

RFC patch to openssl:
https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html

Changes from V2:

* EXPORT_SYMBOL_GPL in patch 1
* Ensure cleanup code always called before sk_stream_kill_queues to
  avoid warnings

Changes from V1:

* EXPORT_SYMBOL GPL in patch 2
* Add link to OpenSSL patch & gnutls example in documentation patch.
* sk_write_pending check was rolled in to wait_for_memory path,
  avoids special case and fixes lock inbalance issue.
* Unify flag handling for sendmsg/sendfile

Changes from RFC V2:

* Generic ULP (upper layer protocol) framework instead of TLS specific
  setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
  Framework will work for both.

RFC V2:

http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html

Changes from RFC V1:

* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
  necessary for zerocopy optimization

RFC V1:

http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html

* Socket based on crypto userspace API framework, required two
  sockets in userspace, one encrypted, one unencrypted.

Paper: https://netdevconf.org/1.2/papers/ktls.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotls: Documentation
Dave Watson [Wed, 14 Jun 2017 18:37:51 +0000 (11:37 -0700)]
tls: Documentation

Add documentation for the tcp ULP tls interface.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotls: kernel TLS support
Dave Watson [Wed, 14 Jun 2017 18:37:39 +0000 (11:37 -0700)]
tls: kernel TLS support

Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions
Dave Watson [Wed, 14 Jun 2017 18:37:26 +0000 (11:37 -0700)]
tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.

tcp_sendpage is exported, but requires the socket lock to not be held already.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: ULP infrastructure
Dave Watson [Wed, 14 Jun 2017 18:37:14 +0000 (11:37 -0700)]
tcp: ULP infrastructure

Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.

Example usage:

setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

modules will call:
tcp_register_ulp(&tcp_tls_ulp_ops);

to register/unregister their ulp, with an init function and name.

A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc.  Example:

$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls

There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'Broadcom-DTE-based-PTP-clock'
David S. Miller [Thu, 15 Jun 2017 16:07:16 +0000 (12:07 -0400)]
Merge branch 'Broadcom-DTE-based-PTP-clock'

Arun Parameswaran says:

====================
Add support for Broadcom DTE based PTP clock

This patchset adds support for the DTE based PTP clock for Broadcom SoCs.

The DTE nco based PTP clock can be used in both wired and wireless networks
for precision time-stmaping purposes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoptp: Add a ptp clock driver for Broadcom DTE
Arun Parameswaran [Mon, 12 Jun 2017 20:26:01 +0000 (13:26 -0700)]
ptp: Add a ptp clock driver for Broadcom DTE

This patch adds a ptp clock driver for the Broadcom SoCs using
the Digital timing Engine (DTE) nco.

Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-binding: ptp: add bindings document for dte based ptp clock
Arun Parameswaran [Mon, 12 Jun 2017 20:26:00 +0000 (13:26 -0700)]
dt-binding: ptp: add bindings document for dte based ptp clock

Add device tree binding documentation for the Broadcom DTE
PTP clock driver.

Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 15 Jun 2017 15:31:37 +0000 (11:31 -0400)]
Merge git://git./linux/kernel/git/davem/net

The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 15 Jun 2017 09:09:47 +0000 (18:09 +0900)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) The netlink attribute passed in to dev_set_alias() is not
    necessarily NULL terminated, don't use strlcpy() on it. From
    Alexander Potapenko.

 2) Fix implementation of atomics in arm64 bpf JIT, from Daniel
    Borkmann.

 3) Correct the release of netdevs and driver private data in certain
    circumstances.

 4) Sanitize netlink message length properly in decnet, from Mateusz
    Jurczyk.

 5) Don't leak kernel data in rtnl_fill_vfinfo() netlink blobs. From
    Yuval Mintz.

 6) Hash secret is never initialized in ipv6 ILA translation code, from
    Arnd Bergmann. I guess those clang warnings about unused inline
    functions are useful for something!

 7) Fix endian selection in bpf_endian.h, from Daniel Borkmann.

 8) Sanitize sockaddr length before dereferncing any fields in AF_UNIX
    and CAIF. From Mateusz Jurczyk.

 9) Fix timestamping for GMAC3 chips in stmmac driver, from Mario
    Molitor.

10) Do not leak netdev on dev_alloc_name() errors in mac80211, from
    Johannes Berg.

11) Fix locking in sctp_for_each_endpoint(), from Xin Long.

12) Fix wrong memset size on 32-bit in snmp6, from Christian Perle.

13) Fix use after free in ip_mc_clear_src(), from WANG Cong.

14) Fix regressions caused by ICMP rate limiting changes in 4.11, from
    Jesper Dangaard Brouer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  i40e: Fix a sleep-in-atomic bug
  net: don't global ICMP rate limit packets originating from loopback
  net/act_pedit: fix an error code
  net: update undefined ->ndo_change_mtu() comment
  net_sched: move tcf_lock down after gen_replace_estimator()
  caif: Add sockaddr length check before accessing sa_family in connect handler
  qed: fix dump of context data
  qmi_wwan: new Telewell and Sierra device IDs
  net: phy: Fix MDIO_THUNDER dependencies
  netconsole: Remove duplicate "netconsole: " logging prefix
  igmp: acquire pmc lock for ip_mc_clear_src()
  r8152: give the device version
  net: rps: fix uninitialized symbol warning
  mac80211: don't send SMPS action frame in AP mode when not needed
  mac80211/wpa: use constant time memory comparison for MACs
  mac80211: set bss_info data before configuring the channel
  mac80211: remove 5/10 MHz rate code from station MLME
  mac80211: Fix incorrect condition when checking rx timestamp
  mac80211: don't look at the PM bit of BAR frames
  i40e: fix handling of HW ATR eviction
  ...

7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 15 Jun 2017 08:54:51 +0000 (17:54 +0900)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a bug on sparc where we may dereference freed stack memory"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: Work around deallocated stack frame reference gcc bug on sparc.

7 years agoMerge tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 15 Jun 2017 08:51:19 +0000 (17:51 +0900)]
Merge tag 'acpi-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert an ACPICA commit from the 4.11 cycle that causes problems
  to happen on some systems and add a protection against possible kernel
  crashes due to table reference counter imbalance.

  Specifics:

   - Revert a 4.11 ACPICA change that made assumptions which are not
     satisfied on some systems and caused the enumeration of resources
     to fail on them (Rafael Wysocki).

   - Add a mechanism to prevent tables from being unmapped prematurely
     due to reference counter overflows (Lv Zheng)"

* tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
  Revert "ACPICA: Disassembler: Enhance resource descriptor detection"

7 years agoMerge tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 15 Jun 2017 08:47:46 +0000 (17:47 +0900)]
Merge tag 'pm-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These revert a recent cpufreq schedutil governor change that turned
  out to be problematic and fix a few minor issues in cpufreq, cpuidle
  and the Exynos devfreq drivers.

  Specifics:

   - Revert a recent cpufreq schedutil governor change that caused some
     systems to behave undesirably (Rafael Wysocki).

   - Fix a cpufreq conservative governor issue introduced during the
     3.10 cycle that prevents it from working as expected in some
     situations (Tomasz Wilczyński).

   - Fix an error code path in the generic cpuidle driver for DT-based
     systems (Christophe Jaillet).

   - Fix three minor issues in devfreq drivers for Exynos (Arvind Yadav,
     Krzysztof Kozlowski)"

* tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: dt: Add missing 'of_node_put()'
  cpufreq: conservative: Allow down_threshold to take values from 1 to 10
  Revert "cpufreq: schedutil: Reduce frequencies slower"
  PM / devfreq: exynos-ppmu: Staticize event list
  PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
  PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable

7 years agoMerge branch 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 15 Jun 2017 08:44:41 +0000 (17:44 +0900)]
Merge branch 'for-4.12/driver-matching-fix' of git://git./linux/kernel/git/jikos/hid

Pull HID fix from Jiri Kosina:

 - ifdef-based bandaid for a long-standing issue with HID driver
   matching, avoiding regressions in cases where specific driver is not
   enabled in kernel .config, from Jiri Kosina

* 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: let generic driver yield control iff specific driver has been enabled

7 years agoMerge tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 15 Jun 2017 08:37:40 +0000 (17:37 +0900)]
Merge tag 'media/v4.12-3' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - some build dependency issues at CEC core with randconfigs

 - fix an off by one error at vb2

 - a race fix at cec core

 - driver fixes at tc358743, sir_ir and rainshadow-cec

* tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] media/cec.h: use IS_REACHABLE instead of IS_ENABLED
  [media] cec: race fix: don't return -ENONET in cec_receive()
  [media] sir_ir: infinite loop in interrupt handler
  [media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE
  [media] cec: improve MEDIA_CEC_RC dependencies
  [media] vb2: Fix an off by one error in 'vb2_plane_vaddr'
  [media] rainshadow-cec: Fix missing spin_lock_init()
  [media] tc358743: fix register i2c_rd/wr function fix

7 years agoi40e: Fix a sleep-in-atomic bug
Jia-Ju Bai [Wed, 14 Jun 2017 23:35:31 +0000 (16:35 -0700)]
i40e: Fix a sleep-in-atomic bug

The driver may sleep under a spin lock, and the function call path is:
i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh)
  i40e_vsi_remove_pvid
    i40e_vlan_stripping_disable
      i40e_aq_update_vsi_params
        i40e_asq_send_command
          mutex_lock --> may sleep

To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and
the lock is acquired again after this function.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'acpica-fixes'
Rafael J. Wysocki [Wed, 14 Jun 2017 23:52:32 +0000 (01:52 +0200)]
Merge branch 'acpica-fixes'

* acpica-fixes:
  ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
  Revert "ACPICA: Disassembler: Enhance resource descriptor detection"

7 years agoMerge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'
Rafael J. Wysocki [Wed, 14 Jun 2017 23:51:33 +0000 (01:51 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'

* pm-cpufreq:
  cpufreq: conservative: Allow down_threshold to take values from 1 to 10
  Revert "cpufreq: schedutil: Reduce frequencies slower"

* pm-cpuidle:
  cpuidle: dt: Add missing 'of_node_put()'

* pm-devfreq:
  PM / devfreq: exynos-ppmu: Staticize event list
  PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
  PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable

7 years agorxrpc: Cache the congestion window setting
David Howells [Wed, 14 Jun 2017 16:56:50 +0000 (17:56 +0100)]
rxrpc: Cache the congestion window setting

Cache the congestion window setting that was determined during a call's
transmission phase when it finishes so that it can be used by the next call
to the same peer, thereby shortcutting the slow-start algorithm.

The value is stored in the rxrpc_peer struct and is accessed without
locking.  Each call takes the value that happens to be there when it starts
and just overwrites the value when it finishes.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: fix VF driver off-by-one bug when setting ethtool -C ethX rx-frames
Weilin Chang [Wed, 14 Jun 2017 16:11:31 +0000 (09:11 -0700)]
liquidio: fix VF driver off-by-one bug when setting ethtool -C ethX rx-frames

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: don't global ICMP rate limit packets originating from loopback
Jesper Dangaard Brouer [Wed, 14 Jun 2017 11:27:37 +0000 (13:27 +0200)]
net: don't global ICMP rate limit packets originating from loopback

Florian Weimer seems to have a glibc test-case which requires that
loopback interfaces does not get ICMP ratelimited.  This was broken by
commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that
gets rate limited").

An ICMP response will usually be routed back-out the same incoming
interface.  Thus, take advantage of this and skip global ICMP
ratelimit when the incoming device is loopback.  In the unlikely event
that the outgoing it not loopback, due to strange routing policy
rules, ICMP rate limiting still works via peer ratelimiting via
icmpv4_xrlim_allow().  Thus, we should still comply with RFC1812
(section 4.3.2.8 "Rate Limiting").

This seems to fix the reproducer given by Florian.  While still
avoiding to perform expensive and unneeded outgoing route lookup for
rate limited packets (in the non-loopback case).

Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: "H.J. Lu" <hjl.tools@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlxfw: fix a NULL dereference
Dan Carpenter [Wed, 14 Jun 2017 10:41:52 +0000 (13:41 +0300)]
net/mlxfw: fix a NULL dereference

If we hit this error path we end up returning ERR_PTR(0) which is NULL.
The caller is not expecting that so it results in a NULL dereference.

Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/act_pedit: fix an error code
Dan Carpenter [Wed, 14 Jun 2017 10:29:31 +0000 (13:29 +0300)]
net/act_pedit: fix an error code

I'm reviewing static checker warnings where we do ERR_PTR(0), which is
the same as NULL.  I'm pretty sure we intended to return ERR_PTR(-EINVAL)
here.  Sometimes these bugs lead to a NULL dereference but I don't
immediately see that problem here.

Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: use skb_unref() in napi_consume_skb()
Paolo Abeni [Wed, 14 Jun 2017 09:48:48 +0000 (11:48 +0200)]
net: use skb_unref() in napi_consume_skb()

The commit 83ada39bb79d ("net: factor out a helper to decrement the
skb refcount") provided and used a helper for decrementing skb usage,
but I missed at least a spot for it.

This change remove some more duplicated code reusing skb_unref() in
napi_consume_skb(), too. The helper uses an additional, unneeded
unlikely(!skb) test - napi_consume_skb() already check it a few lines
above - but the compiler is smart enough to optimize the duplicated
test out.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Wed, 14 Jun 2017 19:22:17 +0000 (15:22 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2017-06-14

Here's another batch of Bluetooth patches for the 4.13 kernel:

 - Fix for Broadcom controllers not supporting Event Mask Page 2
 - New QCA ROME USB ID for btusb
 - Fix for Security Manager Protocol to use constant-time memcmp
 - Improved support for TI WiLink chips

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Fix an off by one bug
Dan Carpenter [Wed, 14 Jun 2017 09:10:10 +0000 (12:10 +0300)]
qed: Fix an off by one bug

The p_l2_info->pp_qid_usage[] array has "p_l2_info->queues" elements so
the > here should be a >= or we write beyond the end of the array.

Fixes: bbe3f233ec5e ("qed: Assign a unique per-queue index to queue-cid")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-Add-support-for-cable-info-access'
David S. Miller [Wed, 14 Jun 2017 19:16:31 +0000 (15:16 -0400)]
Merge branch 'mlxsw-Add-support-for-cable-info-access'

Jiri Pirko says:

====================
mlxsw: Add support for cable info access

Add support for cable info access via ethtool. This is done by accessing
the SFP+/QSFP internal EEPROM.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add support for access cable info via ethtool
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:40 +0000 (09:27 +0200)]
mlxsw: spectrum: Add support for access cable info via ethtool

Add support for access cable info via ethtool.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: reg: Add MCIA register for cable info access
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:39 +0000 (09:27 +0200)]
mlxsw: reg: Add MCIA register for cable info access

The MCIA register is used to access the SFP+ and QSFP connector's
EPROM. It will be used to query the cable info.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: update undefined ->ndo_change_mtu() comment
Magnus Damm [Wed, 14 Jun 2017 07:15:24 +0000 (16:15 +0900)]
net: update undefined ->ndo_change_mtu() comment

Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.

Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-MIPS-infra'
David S. Miller [Wed, 14 Jun 2017 19:03:23 +0000 (15:03 -0400)]
Merge branch 'bpf-MIPS-infra'

David Daney says:

====================
bpf: Changes needed (or desired) for MIPS support

This is a grab bag of changes to the bpf testing infrastructure I
developed working on MIPS eBPF JIT support.  The change to
bpf_jit_disasm is probably universally beneficial, the others are more
MIPS specific.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosamples/bpf: Fix tracex5 to work with MIPS syscalls.
David Daney [Tue, 13 Jun 2017 23:49:38 +0000 (16:49 -0700)]
samples/bpf: Fix tracex5 to work with MIPS syscalls.

There are two problems:

1) In MIPS the __NR_* macros expand to an expression, this causes the
   sections of the object file to be named like:

  .
  .
  .
  [ 5] kprobe/(5000 + 1) PROGBITS        0000000000000000 000160 ...
  [ 6] kprobe/(5000 + 0) PROGBITS        0000000000000000 000258 ...
  [ 7] kprobe/(5000 + 9) PROGBITS        0000000000000000 000348 ...
  .
  .
  .

The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.

2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: Add MIPS support to samples/bpf.
David Daney [Tue, 13 Jun 2017 23:49:37 +0000 (16:49 -0700)]
bpf: Add MIPS support to samples/bpf.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotest_bpf: Add test to make conditional jump cross a large number of insns.
David Daney [Tue, 13 Jun 2017 23:49:36 +0000 (16:49 -0700)]
test_bpf: Add test to make conditional jump cross a large number of insns.

On MIPS, conditional branches can only span 32k instructions.  To
exceed this limit in the JIT with the BPF maximum of 4k insns, we need
to choose eBPF insns that expand to more than 8 machine instructions.
Use BPF_LD_ABS as it is quite complex.  This forces the JIT to invert
the sense of the branch to branch around a long jump to the end.

This (somewhat) verifies that the branch inversion logic and target
address calculation of the long jumps are done correctly.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: bpf_jit_disasm: Handle large images.
David Daney [Tue, 13 Jun 2017 23:49:35 +0000 (16:49 -0700)]
tools: bpf_jit_disasm: Handle large images.

Dynamically allocate memory so that JIT images larger than the size of
the statically allocated array can be handled.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-ctx-narrow'
David S. Miller [Wed, 14 Jun 2017 18:56:26 +0000 (14:56 -0400)]
Merge branch 'bpf-ctx-narrow'

Yonghong Song says:

====================
bpf: permit bpf program narrower loads for ctx fields

Today, if users try to access a ctx field through a narrower load, e.g.,
__be16 prot = __sk_buff->protocol, verifier will fail.
This set contains the verifier change to permit such loads for
certain ctx fields as well as the new test cases in selftests/bpf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests/bpf: Add test cases to test narrower ctx field loads
Yonghong Song [Tue, 13 Jun 2017 22:52:14 +0000 (15:52 -0700)]
selftests/bpf: Add test cases to test narrower ctx field loads

Add test cases in test_verifier and test_progs.
Negative tests are added in test_verifier as well.
The test in test_progs will compare the value of narrower ctx field
load result vs. the masked value of normal full-field load result,
and will fail if they are not the same.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: permits narrower load from bpf program context fields
Yonghong Song [Tue, 13 Jun 2017 22:52:13 +0000 (15:52 -0700)]
bpf: permits narrower load from bpf program context fields

Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
        __u8 h = __sk_buff->hash, or
        __u16 p = __sk_buff->protocol
        __u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.

This patch solves the issue by:
  . Introduce a new parameter ctx_field_size to carry the
    field size of narrower load from prog type
    specific *__is_valid_access validator back to verifier.
  . The non-zero ctx_field_size for a memory access indicates
    (1). underlying prog type specific convert_ctx_accesses
         supporting non-whole-field access
    (2). the current insn is a narrower or whole field access.
  . In verifier, for such loads where load memory size is
    less than ctx_field_size, verifier transforms it
    to a full field load followed by proper masking.
  . Currently, __sk_buff and bpf_perf_event_data->sample_period
    are supporting narrowing loads.
  . Narrower stores are still not allowed as typical ctx stores
    are just normal stores.

Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet_sched: move tcf_lock down after gen_replace_estimator()
WANG Cong [Tue, 13 Jun 2017 20:36:24 +0000 (13:36 -0700)]
net_sched: move tcf_lock down after gen_replace_estimator()

Laura reported a sleep-in-atomic kernel warning inside
tcf_act_police_init() which calls gen_replace_estimator() with
spinlock protection.

It is not necessary in this case, we already have RTNL lock here
so it is enough to protect concurrent writers. For the reader,
i.e. tcf_act_police(), it needs to make decision based on this
rate estimator, in the worst case we drop more/less packets than
necessary while changing the rate in parallel, it is still acceptable.

Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Nick Huber <nicholashuber@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomacvlan: propagate the mac address change status for lowerdev
Zhang Shengju [Tue, 13 Jun 2017 14:45:11 +0000 (22:45 +0800)]
macvlan: propagate the mac address change status for lowerdev

The macvlan dev should propagate the return value of mac address change for
lower device in the passthru mode, instead of always return 0.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 14 Jun 2017 18:26:21 +0000 (14:26 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2017-06-13

This series contains updates to ixgbe and ixgbevf only.

Jake completes his fix ups for our drivers with the ixgbe changes to
resolve a race condition in processing timestamp requests.  These fixes
are the same fixes Jake applied earlier to the other drivers, including
the added statistic to help administrators know when an application
timestamp request is ignored.

With all the recent ixgbe/ixgbevf changes and fixes, Tony bumps the
the driver versions.  Then Tony provides a fix to resolve a static
analysis warning by changing a variable to unsigned integer since the
value can never be negative.

Emil fixes an issue for X550 devices where the qde parameter was being
ignored, so PFQDE.HIDE_VLAN was not being set.

Jeff Mahoney from SuSE fixes a possible kernel crash, where there was
a small window where tasks writing to the sriov_numvfs sysfs attribute
can sneak in after we call register_netdev().  So we need to call
pci_set_drvdata() before and not after register_netdev() to preserve the
intent of commit 0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe
fail").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>