GitHub/LineageOS/android_kernel_motorola_exynos9610.git
8 years agowan/fsl_ucc_hdlc: rewrite error handling to make it clearer
Zhao Qiang [Fri, 15 Jul 2016 02:38:25 +0000 (10:38 +0800)]
wan/fsl_ucc_hdlc: rewrite error handling to make it clearer

It was used err_xxx for labeled statement, it is
not easy to understand, now use free_xxx for labeled
statement.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agowan/fsl_ucc_hdlc: remove reduplicative freed memory 'uhdlc_priv'
Zhao Qiang [Fri, 15 Jul 2016 02:38:24 +0000 (10:38 +0800)]
wan/fsl_ucc_hdlc: remove reduplicative freed memory 'uhdlc_priv'

'uhdlc_priv' has freed twice, drop the first one.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ipmr/ip6mr: add support for keeping an entry age
Nikolay Aleksandrov [Thu, 14 Jul 2016 16:28:27 +0000 (19:28 +0300)]
net: ipmr/ip6mr: add support for keeping an entry age

In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
CC: Satish Ashok <sashok@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorndis_host: Set valid random MAC on buggy devices
Kristian Evensen [Thu, 14 Jul 2016 08:23:03 +0000 (10:23 +0200)]
rndis_host: Set valid random MAC on buggy devices

Some devices of the same type all export the same, random MAC address. This
behavior has been seen on the ZTE MF910, MF823 and MF831, and there are
probably more devices out there. Fix this by generating a valid random MAC
address if we read a random MAC from device.

Also, changed the memcpy() to ether_addr_copy(), as pointed out by
checkpatch.

Suggested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bridge-rx-simplify-fwd-consolidate'
David S. Miller [Sun, 17 Jul 2016 02:57:38 +0000 (19:57 -0700)]
Merge branch 'bridge-rx-simplify-fwd-consolidate'

Nikolay Aleksandrov says:

====================
net: bridge: simplify receive path and consolidate forwarding paths

This set tries to simplify the receive and forwarding paths. Patch 01 is
a trivial style adjustment, patch 02 removes one conditional from the
unicast fast path, patch 03 removes another conditional and more imporantly
removes the skb0/skb2 ambiguity about locally receiving the skb and
switches to a boolean called "local_rcv".
Patch 04 is the most important change which consolidates the forwarding
paths for locally originated and forwarded packets into __br_forward. This
allows us to remove the function pointers giving a minor performance boost,
more importantly it makes it much easier to reason about the forwarding
path and reduces the code duplication that was needed when making changes.
Also it allows the receive path to fully setup the environment prior to
calling any forwarding functions (i.e. to properly set unicast, local_rcv
and search for unicast/mcast dst).
Functionally everything should stay the same after this set.

I've done basic tests with unicast/multicast/broadcast Tx/Rx. Please
review carefully.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bridge: remove _deliver functions and consolidate forward code
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:02 +0000 (06:10 +0300)]
net: bridge: remove _deliver functions and consolidate forward code

Before this patch we had two flavors of most forwarding functions -
_forward and _deliver, the difference being that the latter are used
when the packets are locally originated. Instead of all this function
pointer passing and code duplication, we can just pass a boolean noting
that the packet was locally originated and use that to perform the
necessary checks in __br_forward. This gives a minor performance
improvement but more importantly consolidates the forwarding paths.
Also add a kernel doc comment to explain the exported br_forward()'s
arguments.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bridge: drop skb2/skb0 variables and use a local_rcv boolean
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:01 +0000 (06:10 +0300)]
net: bridge: drop skb2/skb0 variables and use a local_rcv boolean

Currently if the packet is going to be received locally we set skb0 or
sometimes called skb2 variables to the original skb. This can get
confusing and also we can avoid one conditional on the fast path by
simply using a boolean and passing it around. Thanks to Roopa for the
name suggestion.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bridge: rearrange flood vs unicast receive paths
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:00 +0000 (06:10 +0300)]
net: bridge: rearrange flood vs unicast receive paths

This patch removes one conditional from the unicast path by using the fact
that skb is NULL only when the packet is multicast or is local.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bridge: minor style adjustments in br_handle_frame_finish
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:09:59 +0000 (06:09 +0300)]
net: bridge: minor style adjustments in br_handle_frame_finish

Trivial style changes in br_handle_frame_finish.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp_timer.c: Add kernel-doc function descriptions
Richard Sailer [Sat, 16 Jul 2016 02:04:34 +0000 (04:04 +0200)]
tcp_timer.c: Add kernel-doc function descriptions

This adds kernel-doc style descriptions for 6 functions and
fixes 1 typo.

Signed-off-by: Richard Sailer <richard@weltraumpflege.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ti: cpmac: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Fri, 15 Jul 2016 10:39:02 +0000 (12:39 +0200)]
net: ethernet: ti: cpmac: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

There was a check on CAP_NET_ADMIN in cpmac_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ti: cpmac: use phydev from struct net_device
Philippe Reynes [Fri, 15 Jul 2016 10:39:01 +0000 (12:39 +0200)]
net: ethernet: ti: cpmac: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: amd: au1000_eth: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Fri, 15 Jul 2016 10:05:12 +0000 (12:05 +0200)]
net: ethernet: amd: au1000_eth: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

There was a check on CAP_NET_ADMIN in au1000_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: amd: au1000_eth: use phydev from struct net_device
Philippe Reynes [Fri, 15 Jul 2016 10:05:11 +0000 (12:05 +0200)]
net: ethernet: amd: au1000_eth: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: smsc9420: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Fri, 15 Jul 2016 08:36:21 +0000 (10:36 +0200)]
net: ethernet: smsc9420: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: smsc9420: use phydev from struct net_device
Philippe Reynes [Fri, 15 Jul 2016 08:36:20 +0000 (10:36 +0200)]
net: ethernet: smsc9420: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ethoc: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Fri, 15 Jul 2016 07:59:12 +0000 (09:59 +0200)]
net: ethernet: ethoc: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ethoc: use phydev from struct net_device
Philippe Reynes [Fri, 15 Jul 2016 07:59:11 +0000 (09:59 +0200)]
net: ethernet: ethoc: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: pasemi_mac: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Thu, 14 Jul 2016 21:44:53 +0000 (23:44 +0200)]
net: ethernet: pasemi_mac: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: pasemi_mac: use phydev from struct net_device
Philippe Reynes [Thu, 14 Jul 2016 21:44:52 +0000 (23:44 +0200)]
net: ethernet: pasemi_mac: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: xilinx: axienet: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Thu, 14 Jul 2016 17:45:58 +0000 (19:45 +0200)]
net: ethernet: xilinx: axienet: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: xilinx: axienet: use phydev from struct net_device
Philippe Reynes [Thu, 14 Jul 2016 17:45:57 +0000 (19:45 +0200)]
net: ethernet: xilinx: axienet: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: tc35815: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Thu, 14 Jul 2016 13:20:47 +0000 (15:20 +0200)]
net: ethernet: tc35815: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: tc35815: use phydev from struct net_device
Philippe Reynes [Thu, 14 Jul 2016 13:20:46 +0000 (15:20 +0200)]
net: ethernet: tc35815: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fixup for tracepoint napi:napi_poll
Jesper Dangaard Brouer [Fri, 15 Jul 2016 21:55:20 +0000 (23:55 +0200)]
net: fixup for tracepoint napi:napi_poll

The recent change to tracepoint napi:napi_poll changed the order of
the parameters that perf scripts sees, the printk was correct.  The
problem was that the new parameters (work and budget) were pushed
in front of dev_name.

The new parameters obviously need to be appended to keep backward
compatible.

Fixes: 1db19db7f5ff ("net: tracepoint napi:napi_poll add work and budget")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacvtap: switch to use skb array
Jason Wang [Fri, 15 Jul 2016 07:46:31 +0000 (03:46 -0400)]
macvtap: switch to use skb array

This patch switch to use skb array instead of sk_receive_queue to
avoid spinlock contentions. Tests shows about 21% improvements for
guest rx pps:

Before: 1472731 pkts/s
After:  1786289 pkts/s

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacvtap: avoid hash calculating for single queue
Jason Wang [Fri, 15 Jul 2016 07:46:30 +0000 (03:46 -0400)]
macvtap: avoid hash calculating for single queue

We decide the rxq through calculating its hash which is not necessary
if we only have one rx queue. So this patch skip this and just return
queue 0. Test shows 22% improving on guest rx pps.

Before: 1201504 pkts/s
After:  1472731 pkts/s

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf-event-output-helper-improvements'
David S. Miller [Fri, 15 Jul 2016 21:23:56 +0000 (14:23 -0700)]
Merge branch 'bpf-event-output-helper-improvements'

Daniel Borkmann says:

====================
BPF event output helper improvements

This set adds improvements to the BPF event output helper to
support non-linear data sampling, here specifically, for skb
context. For details please see individual patches. The set
is based against net-next tree.

v1 -> v2:
  - Integrated and adapted Peter's diff into patch 1, updated
    the remaining ones accordingly. Thanks Peter!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: avoid stack copy and use skb ctx for event output
Daniel Borkmann [Thu, 14 Jul 2016 16:08:05 +0000 (18:08 +0200)]
bpf: avoid stack copy and use skb ctx for event output

This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, perf: split bpf_perf_event_output
Daniel Borkmann [Thu, 14 Jul 2016 16:08:04 +0000 (18:08 +0200)]
bpf, perf: split bpf_perf_event_output

Split the bpf_perf_event_output() helper as a preparation into
two parts. The new bpf_perf_event_output() will prepare the raw
record itself and test for unknown flags from BPF trace context,
where the __bpf_perf_event_output() does the core work. The
latter will be reused later on from bpf_event_output() directly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoperf, events: add non-linear data support for raw records
Daniel Borkmann [Thu, 14 Jul 2016 16:08:03 +0000 (18:08 +0200)]
perf, events: add non-linear data support for raw records

This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

  [1] http://thread.gmane.org/gmane.linux.network/421294

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorxrpc: checking for IS_ERR() instead of NULL
Dan Carpenter [Thu, 14 Jul 2016 14:47:01 +0000 (15:47 +0100)]
rxrpc: checking for IS_ERR() instead of NULL

The rxrpc_lookup_peer() function returns NULL on error, it never returns
error pointers.

Fixes: 8496af50eb38 ('rxrpc: Use RCU to access a peer's service connection tree')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: micrel: Add KSZ8041FTL fiber mode support
Philipp Zabel [Thu, 14 Jul 2016 14:29:43 +0000 (16:29 +0200)]
net: phy: micrel: Add KSZ8041FTL fiber mode support

We can't detect the FXEN (fiber mode) bootstrap pin, so configure
it via a boolean device tree property "micrel,fiber-mode".
If it is enabled, auto-negotiation is not supported.
The only available modes are 100base-fx (full duplex and half duplex).

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agowan/fsl_ucc_hdlc: info leak in uhdlc_ioctl()
Dan Carpenter [Thu, 14 Jul 2016 11:16:53 +0000 (14:16 +0300)]
wan/fsl_ucc_hdlc: info leak in uhdlc_ioctl()

There is a 2 byte struct whole after line.loopback so we need to clear
that out to avoid disclosing stack information.

Fixes: c19b6d246a35 ('drivers/net: support hdlc function for QE-UCC')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'rds-enable-mprds'
David S. Miller [Fri, 15 Jul 2016 18:36:58 +0000 (11:36 -0700)]
Merge branch 'rds-enable-mprds'

Sowmini Varadhan says:

====================
RDS: TCP: Enable mprds for rds-tcp

The third, and final, installment for mprds-tcp changes.

In Patch 3 of this set, if the transport support t_mp_capable,
we hash outgoing traffic across multiple paths.  Additionally, even if
the transport is MP capable, we may be peering with some node that does
not support mprds, or supports a different number of paths. This
necessitates RDS control plane changes so that both peers agree
on the number of paths to be used for the rds-tcp connection.
Patch 3 implements all these changes, which are documented in patch 5
of the series.

Patch 1 of this series is a bug fix for a race-condition
that has always existed, but is now more easily encountered with mprds.
Patch 2 is code refactoring. Patches 4 and 5 are Documentation updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: RDS: Document Multipath RDS (mprds)
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:05 +0000 (03:51 -0700)]
Documentation: RDS: Document Multipath RDS (mprds)

Document the design of mprds, covering a brief description
of the motivation, data-structures and modifications to the
RDS control plane.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: RDS: updates for SO_RDS_TRANSPORT socket option
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:04 +0000 (03:51 -0700)]
Documentation: RDS: updates for SO_RDS_TRANSPORT socket option

Update the documentation to describe the changes added by
commit 8ba38460f363 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT")

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: Enable multipath RDS for TCP
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:03 +0000 (03:51 -0700)]
RDS: TCP: Enable multipath RDS for TCP

Use RDS probe-ping to compute how many paths may be used with
the peer, and to synchronously start the multiple paths. If mprds is
supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
when multipath RDS is supported by the transport.

CC: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:02 +0000 (03:51 -0700)]
RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()

Some code duplication in rds_tcp_reset_callbacks() can be avoided
by having the function call rds_tcp_restore_callbacks() and
rds_tcp_set_callbacks().

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:01 +0000 (03:51 -0700)]
RDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready

As the existing comments in rds_tcp_listen_data_ready() indicate,
it is possible under some race-windows to get to this function with the
accept() socket. If that happens, we could run into a sequence whereby

   thread 1 thread 2

rds_tcp_accept_one() thread
sets up new_sock via ->accept().
The sk_user_data is now
sock_def_readable
data comes in for new_sock,
->sk_data_ready is called, and
we land in rds_tcp_listen_data_ready
rds_tcp_set_callbacks()
takes the sk_callback_lock and
sets up sk_user_data to be the cp
read_lock sk_callback_lock
ready = cp
unlock sk_callback_lock
page fault on ready

In the above sequence, we end up with a panic on a bad page reference
when trying to execute (*ready)(). Instead we need to call
sock_def_readable() safely, which is what this patch achieves.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodevlink: fix trace format string
Arnd Bergmann [Thu, 14 Jul 2016 09:37:29 +0000 (11:37 +0200)]
devlink: fix trace format string

Including devlink.h on ARM and probably other 32-bit architectures results in
a harmless warning:

In file included from ../include/trace/define_trace.h:95:0,
                 from ../include/trace/events/devlink.h:51,
                 from ../net/core/devlink.c:30:
include/trace/events/devlink.h: In function 'trace_raw_output_devlink_hwmsg':
include/trace/events/devlink.h:42:12: error: format '%lu' expects argument of type 'long unsigned int', but argument 10 has type 'size_t {aka unsigned int}' [-Werror=format=]

The correct format string for 'size_t' is %zu, not %lu, this works on all
architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: e5224f0fe2ac ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotracing: change owner name to driver name for devlink hwmsg tracepoint
Jiri Pirko [Thu, 14 Jul 2016 09:37:28 +0000 (11:37 +0200)]
tracing: change owner name to driver name for devlink hwmsg tracepoint

Turned on that driver->owner which is struct module is not available when
modules are disabled. Better to depend on a driver name which is
always available.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: e5224f0fe2 ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_router: Return -ENOENT in case of error
Christophe Jaillet [Thu, 14 Jul 2016 06:18:45 +0000 (08:18 +0200)]
mlxsw: spectrum_router: Return -ENOENT in case of error

'vr' should be a valid pointer here, so returning 'PTR_ERR(vr)' is wrong.
Return an explicit error code (-ENOENT) instead.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ll_temac: use phy_ethtool_{get|set}_link_ksettings
Philippe Reynes [Wed, 13 Jul 2016 23:48:52 +0000 (01:48 +0200)]
net: ethernet: ll_temac: use phy_ethtool_{get|set}_link_ksettings

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ll_temac: use phydev from struct net_device
Philippe Reynes [Wed, 13 Jul 2016 23:48:51 +0000 (01:48 +0200)]
net: ethernet: ll_temac: use phydev from struct net_device

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'wireless-drivers-next-for-davem-2016-07-13' of git://git.kernel.org/pub...
David S. Miller [Thu, 14 Jul 2016 23:27:42 +0000 (16:27 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2016-07-13' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.8

Major changes:

iwlwifi

* more work on the RX path for the 9000 device series
* some more dynamic queue allocation work
* SAR BIOS implementation
* some work on debugging capabilities
* added support for GCMP encryption
* data path rework in preparation for new HW
* some cleanup to remove transport dependency on mac80211
* support for MSIx in preparation for new HW
* lots of work in preparation for HW support (9000 and a000 series)

mwifiex

* implement get_tx_power and get_antenna cfg80211 operation callbacks

wl18xx

* add support for 64bit clock

rtl8xxxu

* aggregation support (optional for now)

Also wireless-drivers is merged to fix some conflicts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'pktgen-scripts'
David S. Miller [Thu, 14 Jul 2016 22:19:52 +0000 (15:19 -0700)]
Merge branch 'pktgen-scripts'

Jesper Dangaard Brouer says:

====================
pktgen samples: new scripts and removing older samples

This patchset is adding some pktgen sample scripts that I've been
using for a while[1], and they seams to relevant for more people.

Patchset also remove some of the older style pktgen samples.

[1] https://github.com/netoptimizer/network-testing/tree/master/pktgen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopktgen: remove sample script pktgen.conf-1-1-rdos
Jesper Dangaard Brouer [Wed, 13 Jul 2016 20:06:15 +0000 (22:06 +0200)]
pktgen: remove sample script pktgen.conf-1-1-rdos

Removing the pktgen sample script pktgen.conf-1-1-rdos, because
it does not contain anything that is not covered by the other and
newer style sample scripts.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopktgen: add sample script pktgen_sample05_flow_per_thread.sh
Jesper Dangaard Brouer [Wed, 13 Jul 2016 20:06:10 +0000 (22:06 +0200)]
pktgen: add sample script pktgen_sample05_flow_per_thread.sh

This pktgen sample script is useful for scalability testing a
receiver.  The script will simply generate one flow per
thread (option -t N) using the thread number as part of the
source IP-address.

The single flow sample (pktgen_sample03_burst_single_flow.sh)
have become quite popular, but it is important that developers
also make sure to benchmark scalability of multiple receive
queues.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopktgen: add sample script pktgen_sample04_many_flows.sh
Jesper Dangaard Brouer [Wed, 13 Jul 2016 20:06:04 +0000 (22:06 +0200)]
pktgen: add sample script pktgen_sample04_many_flows.sh

Adding a pktgen sample script that demonstrates how to use pktgen
for simulating flows.  Script will generate a certain number of
concurrent flows ($FLOWS) and each flow will contain $FLOWLEN
packets, which will be send back-to-back, before switching to a
new flow, due to flag FLOW_SEQ.

This script obsoletes the old sample script 'pktgen.conf-1-1-flows',
which is removed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlx5-bulk-flow-stats-sriov-tc-offloads'
David S. Miller [Thu, 14 Jul 2016 20:34:30 +0000 (13:34 -0700)]
Merge branch 'mlx5-bulk-flow-stats-sriov-tc-offloads'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 Bulk flow statistics and SRIOV TC offloads

This series from Amir and Or deals with two enhancements for the mlx5 TC offloads.

The 1st two patches add bulk reading of flow counters. Few bulk counter queries are
used instead of issuing thousands firmware commands per second to get statistics of all
flows set to HW.

The next patches add TC based SRIOV offloading to mlx5, as a follow up for the e-switch
offloads mode and the VF representors. When the e-switch is set to the (new) "offloads"
mode, we can now offload TC/flower drop and forward rules, the forward action we offload
is TC mirred/redirect.

The above is done by the VF representor netdevices exporting the setup_tc ndo where from
there we're re-using and enhancing the existing mlx5 TC offloads sub-module which now
works for both the NIC and the SRIOV cases.

The series is applied on top b38a75d2d324 ('mlxsw: core: Trace EMAD messages')
and it has no merge issues with the on-going net submission ('mlx5 tx timeout watchdog fixes')

V2:
    - Fixed compilation warning.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add TC offload support for the VF representors netdevice
Or Gerlitz [Thu, 14 Jul 2016 07:32:46 +0000 (10:32 +0300)]
net/mlx5e: Add TC offload support for the VF representors netdevice

The VF representors support only TC filter/action offloads
(not mqprio) and this is enabled for them by default.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add TC HW support for FDB (SRIOV e-switch) offloads
Or Gerlitz [Thu, 14 Jul 2016 07:32:45 +0000 (10:32 +0300)]
net/mlx5e: Add TC HW support for FDB (SRIOV e-switch) offloads

Enhance the TC offload code such that when the eswitch exists and it's
mode being SRIOV offloads, we do TC actions parsing and setup targeted
for eswitch. Next, we add the offloaded flow to the HW e-switch (fdb).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads
Or Gerlitz [Thu, 14 Jul 2016 07:32:44 +0000 (10:32 +0300)]
net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads

Add the setup code that parses the TC actions needed to support offloading drop
and mirred/redirect for SRIOV e-switch. We can redirect between two devices if
they belong to the same HW switch, compare the switchdev HW ID attribute to
enforce that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/switchdev: Export the same parent ID service function
Or Gerlitz [Thu, 14 Jul 2016 07:32:43 +0000 (10:32 +0300)]
net/switchdev: Export the same parent ID service function

This helper serves to know if two switchdev port netdevices belong to the
same HW ASIC, e.g to figure out if forwarding offload is possible between them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Adjustments in the TC offload code towards reuse for SRIOV
Or Gerlitz [Thu, 14 Jul 2016 07:32:42 +0000 (10:32 +0300)]
net/mlx5e: Adjustments in the TC offload code towards reuse for SRIOV

Towards reusing the TC offloads code for an SRIOV use-case, change some of the
helper functions to have _nic in their names so it's clear what's NIC unique
and what's general. Also group together the NIC related helpers so we can easily
branch per the use-case in downstream patch.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: E-Switch, Add API to configure rules for the offloaded mode
Or Gerlitz [Thu, 14 Jul 2016 07:32:41 +0000 (10:32 +0300)]
net/mlx5: E-Switch, Add API to configure rules for the offloaded mode

This allows for upper levels in the driver, e.g the TC offload code to add
e-switch offloaded steering rules. The caller provides the rule spec for
matching, action, source and destination vports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: E-Switch, Use two priorities for SRIOV offloads mode
Or Gerlitz [Thu, 14 Jul 2016 07:32:40 +0000 (10:32 +0300)]
net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode

In the offloads mode, some slow path rules are added by the driver (e.g
send-to-vport), while offloaded rules are to be added from upper layers.

The slow path rules have lower priority and we don't want matching on
offloaded rules to suffer from extra steering hops related to the slow
path rules.

We use two priorities, one for offloaded rules (fast path), and one for
the control rules (slow path). To allow for that, we enable two priorities
for the FDB namespace in the FS core code.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Offload TC flow counters only when supported
Or Gerlitz [Thu, 14 Jul 2016 07:32:39 +0000 (10:32 +0300)]
net/mlx5e: Offload TC flow counters only when supported

Currenly, the code that programs the flow actions into the firmware
doesn't check if was actually asked to offload the statistics, fix that.

Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce bulk reading of flow counters
Amir Vadai [Thu, 14 Jul 2016 07:32:38 +0000 (10:32 +0300)]
net/mlx5: Introduce bulk reading of flow counters

This commit utilize the ability of ConnectX-4 to bulk read flow counters.
Few bulk counter queries could be done instead of issuing thousands of
firmware commands per second to get statistics of all flows set to HW,
such as those programmed when we offload tc filters.

Counters are stored sorted by hardware id, and queried in blocks (id +
number of counters).

Due to hardware requirement, start of block and number of counters in a
block must be four aligned.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Store counters in rbtree instead of list
Amir Vadai [Thu, 14 Jul 2016 07:32:37 +0000 (10:32 +0300)]
net/mlx5: Store counters in rbtree instead of list

In order to use bulk counters, we need to have counters sorted by id.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'nps_ent-coding-style'
David S. Miller [Thu, 14 Jul 2016 03:59:07 +0000 (20:59 -0700)]
Merge branch 'nps_ent-coding-style'

Elad Kanfi says:

====================
Code style fixes

Fix all checkpatch warnings and errors, and reuse code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: nps_enet: code reuse
Elad Kanfi [Wed, 13 Jul 2016 13:58:07 +0000 (16:58 +0300)]
net: nps_enet: code reuse

Add inline function that checks if there is a pending tx packet.

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: nps_enet: fix coding style issues
Elad Kanfi [Wed, 13 Jul 2016 13:58:06 +0000 (16:58 +0300)]
net: nps_enet: fix coding style issues

Fix following coding style problems :

ERROR: else should follow close brace '}'
+ }
+ else { /* !dst_is_aligned */

WARNING: Missing a blank line after declarations
+ u32 buf = nps_enet_reg_get(priv, NPS_ENET_REG_RX_BUF);
+ put_unaligned_be32(buf, reg);

WARNING: Missing a blank line after declarations
+ u32 buf;
+ ioread32_rep(priv->regs_base + NPS_ENET_REG_RX_BUF, &buf, 1);

CHECK: Blank lines aren't necessary before a close brace '}'
+
+ }

total: 1 errors, 2 warnings, 1 checks, 683 lines checked

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'sctp-gso-frags-from-chunk'
David S. Miller [Thu, 14 Jul 2016 01:10:15 +0000 (18:10 -0700)]
Merge branch 'sctp-gso-frags-from-chunk'

Marcelo Ricardo Leitner says:

====================
sctp: allow GSO frags to access the chunk too

Patchset is named after the most important fix in it. First two patches
are preparing the grounds for the 3rd patch.

After the 3rd, they are not strictly logically related to the patchset,
but I kept them together as they depend on each other.

More details on patch changelogs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: only check for ECN if peer is using it
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:09:00 +0000 (15:09 -0300)]
sctp: only check for ECN if peer is using it

Currently only read-only checks are performed up to the point on where
we check if peer is ECN capable, checks which we can avoid otherwise.
The flag ecn_ce_done is only used to perform this check once per
incoming packet, and nothing more.

Thus this patch moves the peer check up.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: do not clear chunk->ecn_ce_done flag
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:08:59 +0000 (15:08 -0300)]
sctp: do not clear chunk->ecn_ce_done flag

We should not clear that flag when switching to a new skb from a GSO skb
because it would cause ECN processing to happen multiple times per GSO
skb, which is not wanted. Instead, let it be processed once per chunk.
That is, in other words, once per IP header available.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: avoid identifying address family many times for a chunk
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:08:58 +0000 (15:08 -0300)]
sctp: avoid identifying address family many times for a chunk

Identifying address family operations during rx path is not something
expensive but it's ugly to the eye to have it done multiple times,
specially when we already validated it during initial rx processing.

This patch takes advantage of the now shared sctp_input_cb and make the
pointer to the operations readily available.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: allow GSO frags to access the chunk too
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:08:57 +0000 (15:08 -0300)]
sctp: allow GSO frags to access the chunk too

SCTP will try to access original IP headers on sctp_recvmsg in order to
copy the addresses used. There are also other places that do similar access
to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO
support") they aren't always there because they are only present in the
header skb.

SCTP handles the queueing of incoming data by cloning the incoming skb
and limiting to only the relevant payload. This clone has its cb updated
to something different and it's then queued on socket rx queue. Thus we
need to fix this in two moments.

For rx path, not related to socket queue yet, this patch uses a
partially copied sctp_input_cb to such GSO frags. This restores the
ability to access the headers for this part of the code.

Regarding the socket rx queue, it removes iif member from sctp_event and
also add a chunk pointer on it.

With these changes we're always able to reach the headers again.

The biggest change here is that now the sctp_chunk struct and the
original skb are only freed after the application consumed the buffer.
Note however that the original payload was already like this due to the
skb cloning.

For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
IPv6 now can fetch it directly from original's IPv6 CB as the original
skb is still accessible.

In the future we probably can simplify sctp_v*_skb_iif() stuff, as
sctp_v4_skb_iif() was called but it's return value not used, and now
it's not even called, but such cleanup is out of scope for this change.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: reorder sctp_ulpevent and shrink msg_flags
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:08:56 +0000 (15:08 -0300)]
sctp: reorder sctp_ulpevent and shrink msg_flags

The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to
bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so
we shrink it, and iif member (4 bytes) which can be easily fetched from
another place once the next patch is there, so we remove it and thus
creating space for 8 bytes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: allow others to use sctp_input_cb
Marcelo Ricardo Leitner [Wed, 13 Jul 2016 18:08:55 +0000 (15:08 -0300)]
sctp: allow others to use sctp_input_cb

We process input path in other files too and having access to it is
nice, so move it to a header where it's shared.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: bgmac: Remove redundant dev_err call in bgmac_probe()
Wei Yongjun [Wed, 13 Jul 2016 12:46:57 +0000 (12:46 +0000)]
net: ethernet: bgmac: Remove redundant dev_err call in bgmac_probe()

There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: dwmac-socfpga: remove redundant dev_err call in socfpga_dwmac_parse_data()
Wei Yongjun [Wed, 13 Jul 2016 12:46:40 +0000 (12:46 +0000)]
stmmac: dwmac-socfpga: remove redundant dev_err call in socfpga_dwmac_parse_data()

There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: Address comments from last documentation update
David Ahern [Thu, 14 Jul 2016 00:28:16 +0000 (18:28 -0600)]
net: vrf: Address comments from last documentation update

Comments from Frank Kellerman on last doc update:
- extra whitespace in front of a neigh show command
- convert the brief link example to 'vrf red'

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Wed, 13 Jul 2016 23:05:43 +0000 (16:05 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2016-07-13

Here's our main bluetooth-next pull request for the 4.8 kernel:

 - Fixes and cleanups in 802.15.4 and 6LoWPAN code
 - Fix out of bounds issue in btmrvl driver
 - Fixes to Bluetooth socket recvmsg return values
 - Use crypto_cipher_encrypt_one() instead of crypto_skcipher
 - Cleanup of Bluetooth connection sysfs interface
 - New Authentication failure reson code for Disconnected mgmt event
 - New USB IDs for Atheros, Qualcomm and Intel Bluetooth controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: Documentation update
David Ahern [Tue, 12 Jul 2016 21:04:23 +0000 (15:04 -0600)]
net: vrf: Documentation update

Update vrf documentation for changes made to 4.4 - 4.8 kernels
and iproute2 support for vrf keyword.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoBluetooth: Increment management interface revision
Johan Hedberg [Wed, 13 Jul 2016 07:57:18 +0000 (10:57 +0300)]
Bluetooth: Increment management interface revision

Increment the mgmt revision due to the recently added new
reason code for the Disconnected event.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
8 years agoBluetooth: Add Authentication Failed reason to Disconnected Mgmt event
Szymon Janc [Tue, 12 Jul 2016 00:12:16 +0000 (02:12 +0200)]
Bluetooth: Add Authentication Failed reason to Disconnected Mgmt event

If link is disconnected due to Authentication Failure (PIN or Key
Missing status) userspace will be notified about this with proper error
code. Many LE profiles define "PIN or Key Missing" status as indication
of remote lost bond so this allows userspace to take action on this.

@ Device Connected: 88:63:DF:88:0E:83 (1) flags 0x0000
        02 01 1a 05 03 0a 18 0d 18 0b 09 48 65 61 72 74  ...........Heart
        20 52 61 74 65                                    Rate
> HCI Event: Command Status (0x0f) plen 4
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
> ACL Data RX: Handle 3585 flags 0x02 dlen 11
      ATT: Read By Group Type Request (0x10) len 6
        Handle range: 0x0001-0xffff
        Attribute group type: Primary Service (0x2800)
> HCI Event: LE Meta Event (0x3e) plen 12
      LE Read Remote Used Features (0x04)
        Status: Success (0x00)
        Handle: 3585
        Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
          LE Encryption
< HCI Command: LE Start Encryption (0x08|0x0019) plen 28
        Handle: 3585
        Random number: 0x0000000000000000
        Encrypted diversifier: 0x0000
        Long term key: 26201cd479a0921b6f949f0b1fa8dc82
> HCI Event: Command Status (0x0f) plen 4
      LE Start Encryption (0x08|0x0019) ncmd 1
        Status: Success (0x00)
> HCI Event: Encryption Change (0x08) plen 4
        Status: PIN or Key Missing (0x06)
        Handle: 3585
        Encryption: Disabled (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3
        Handle: 3585
        Reason: Authentication Failure (0x05)
> HCI Event: Command Status (0x0f) plen 4
      Disconnect (0x01|0x0006) ncmd 1
        Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4
        Status: Success (0x00)
        Handle: 3585
        Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: 88:63:DF:88:0E:83 (1) reason 4

@ Device Connected: C4:43:8F:A3:4D:83 (0) flags 0x0000
        08 09 4e 65 78 75 73 20 35                       ..Nexus 5
> HCI Event: Command Status (0x0f) plen 4
      Authentication Requested (0x01|0x0011) ncmd 1
        Status: Success (0x00)
> HCI Event: Link Key Request (0x17) plen 6
        Address: C4:43:8F:A3:4D:83 (LG Electronics)
< HCI Command: Link Key Request Reply (0x01|0x000b) plen 22
        Address: C4:43:8F:A3:4D:83 (LG Electronics)
        Link key: 080812e4aa97a863d11826f71f65a933
> HCI Event: Command Complete (0x0e) plen 10
      Link Key Request Reply (0x01|0x000b) ncmd 1
        Status: Success (0x00)
        Address: C4:43:8F:A3:4D:83 (LG Electronics)
> HCI Event: Auth Complete (0x06) plen 3
        Status: PIN or Key Missing (0x06)
        Handle: 75
@ Authentication Failed: C4:43:8F:A3:4D:83 (0) status 0x05
< HCI Command: Disconnect (0x01|0x0006) plen 3
        Handle: 75
        Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4
      Disconnect (0x01|0x0006) ncmd 1
        Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4
        Status: Success (0x00)
        Handle: 75
        Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: C4:43:8F:A3:4D:83 (0) reason 4

Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
8 years agomlxsw: core: Trace EMAD messages
Jiri Pirko [Tue, 12 Jul 2016 16:05:04 +0000 (18:05 +0200)]
mlxsw: core: Trace EMAD messages

Trace EMAD messages going down to HW and up from HW. Devlink needs to be
registered before EMAD init so the trace function can be called
with valid devlink handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v1->v2:
- Use trace_devlink_hwmsg directly
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodevlink: add hardware messages tracing facility
Jiri Pirko [Tue, 12 Jul 2016 16:05:03 +0000 (18:05 +0200)]
devlink: add hardware messages tracing facility

Define a tracepoint and allow user to trace messages going to and from
hardware associated with devlink instance.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Fix non static symbol warning
Wei Yongjun [Tue, 12 Jul 2016 15:24:10 +0000 (15:24 +0000)]
net: dsa: Fix non static symbol warning

Fixes the following sparse warning:

net/dsa/dsa2.c:680:6: warning:
 symbol '_dsa_unregister_switch' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodwc_eth_qos: fix missing clk_disable_unprepare() on error in dwceqos_probe()
Wei Yongjun [Tue, 12 Jul 2016 11:43:37 +0000 (11:43 +0000)]
dwc_eth_qos: fix missing clk_disable_unprepare() on error in dwceqos_probe()

Fix missing clk_disable_unprepare() call before return
from dwceqos_probe() in the error handling case of invalid
fixed-link.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: mediatek: fix non static symbol warnings
Wei Yongjun [Tue, 12 Jul 2016 11:36:44 +0000 (11:36 +0000)]
net: mediatek: fix non static symbol warnings

Fixes the following sparse warnings:

drivers/net/ethernet/mediatek/mtk_eth_soc.c:79:5: warning:
 symbol '_mtk_mdio_write' was not declared. Should it be static?
drivers/net/ethernet/mediatek/mtk_eth_soc.c:98:5: warning:
 symbol '_mtk_mdio_read' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorxrpc: Fix error handling in af_rxrpc_init()
Wei Yongjun [Tue, 12 Jul 2016 11:21:17 +0000 (11:21 +0000)]
rxrpc: Fix error handling in af_rxrpc_init()

security initialized after alloc workqueue, so we should exit security
before destroy workqueue in the error handing.

Fixes: 648af7fca159 ("rxrpc: Absorb the rxkad security module")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: dwmac-socfpga: fix wrong pointer passed to PTR_ERR()
Wei Yongjun [Tue, 12 Jul 2016 11:00:09 +0000 (11:00 +0000)]
stmmac: dwmac-socfpga: fix wrong pointer passed to PTR_ERR()

PTR_ERR should access the value just tested by IS_ERR, otherwise
the wrong error code will be returned.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotools: hv: Add a script to help bonding synthetic and VF NICs
Haiyang Zhang [Tue, 12 Jul 2016 00:06:42 +0000 (17:06 -0700)]
tools: hv: Add a script to help bonding synthetic and VF NICs

This script helps to create bonding network devices based on synthetic NIC
(the virtual network adapter usually provided by Hyper-V) and the matching
VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
function as one network device, and fail over to the synthetic NIC if VF is
down.

Mayjor distros (RHEL, Ubuntu, SLES) supported by Hyper-V are supported by
this script.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoiwlwifi: add missing type declaration
Arnd Bergmann [Mon, 11 Jul 2016 20:49:53 +0000 (22:49 +0200)]
iwlwifi: add missing type declaration

The iwl-debug.h header relies in implicit inclusion of linux/device.h and
we get a lot of warnings without that:

drivers/net/wireless/intel/iwlwifi/iwl-debug.h:44:23: error: 'struct device' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
 void __iwl_err(struct device *dev, bool rfkill_prefix, bool only_trace,
                       ^~~~~~
In file included from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.h:66:0,
                 from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c:68:
drivers/net/wireless/intel/iwlwifi/iwl-trans.h: In function 'iwl_trans_tx':
drivers/net/wireless/intel/iwlwifi/iwl-trans.h:1030:348: error: passing argument 1 of '__iwl_err' from incompatible pointer type [-Werror=incompatible-pointer-types]
   IWL_ERR(trans, "%s bad state = %d\n", __func__, trans->state);
                                                                                                                                                                                                                                                                                                                                                            ^
In file included from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c:67:0:
drivers/net/wireless/intel/iwlwifi/iwl-debug.h:44:6: note: expected 'struct device *' but argument is of type 'struct device *'
 void __iwl_err(struct device *dev, bool rfkill_prefix, bool only_trace,
      ^~~~~~~~~

The easiest workaround is to just declare 'struct device' before its first use,
rather than including the entire header file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 21cb3222fe56 ("iwlwifi: decouple PCIe transport from mac80211")
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
8 years agomrf24j40: avoid uninitialized byte in SPI transfer to radio.
Walter Mack [Tue, 12 Jul 2016 03:02:16 +0000 (20:02 -0700)]
mrf24j40: avoid uninitialized byte in SPI transfer to radio.

isr function issues SPI read command to mrf to obtain INTSTAT.
SPI transfer is 2 bytes, but value of 2nd byte is not defined.
This had the effect that only the first ISR worked as intended. The
second ISR read incorrect INTSTAT values. Observed on Raspberry PI B+.

Signed-off-by: Walter Mack <wmack@componentsw.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
8 years agoipv4: af_inet: make it explicitly non-modular
Paul Gortmaker [Mon, 11 Jul 2016 20:37:51 +0000 (16:37 -0400)]
ipv4: af_inet: make it explicitly non-modular

The Makefile controlling compilation of this file is obj-y,
meaning that it currently is never being built as a module.

Since MODULE_ALIAS is a no-op for non-modular code, we can simply
remove the MODULE_ALIAS_NETPROTO variant used here.

We replace module.h with kmod.h since the file does make use of
request_module() in order to load other modules from here.

We don't have to worry about init.h coming in via the removed
module.h since the file explicitly includes init.h already.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: bgmac: Fix return value check in bgmac_probe()
Wei Yongjun [Tue, 12 Jul 2016 00:17:28 +0000 (00:17 +0000)]
net: ethernet: bgmac: Fix return value check in bgmac_probe()

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoBluetooth: Add support of 13d3:3490 AR3012 device
Dmitry Tunin [Mon, 11 Jul 2016 22:35:18 +0000 (01:35 +0300)]
Bluetooth: Add support of 13d3:3490 AR3012 device

T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=05 Dev#= 5 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=13d3 ProdID=3490 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

BugLink: https://bugs.launchpad.net/bugs/1600623
Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
8 years agob53: Fix build warning.
David S. Miller [Mon, 11 Jul 2016 21:30:52 +0000 (14:30 -0700)]
b53: Fix build warning.

   drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      pdata->chip_id = (u32)of_id->data;
                       ^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: make inode code explicitly non-modular
Paul Gortmaker [Mon, 11 Jul 2016 16:51:01 +0000 (12:51 -0400)]
bpf: make inode code explicitly non-modular

The Kconfig currently controlling compilation of this code is:

init/Kconfig:config BPF_SYSCALL
init/Kconfig:   bool "Enable bpf() system call"

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

Note that MODULE_ALIAS is a no-op for non-modular code.

We replace module.h with init.h since the file does use __init.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: check idx is -ENOSPC before using it is an index
Colin Ian King [Mon, 11 Jul 2016 15:54:20 +0000 (16:54 +0100)]
nfp: check idx is -ENOSPC before using it is an index

idx can be returned as -ENOSPC, so we should check for this first
before using it as an index into nn->vxlan_usecnt[] to avoid an
out of bounds array offset read.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: smc91x: ACPI Enable lan91x adapters
Jeremy Linton [Mon, 11 Jul 2016 15:28:40 +0000 (10:28 -0500)]
net: smc91x: ACPI Enable lan91x adapters

Enable lan91x adapters in some ARM machines and models
when booted with an ACPI kernel.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodrivers/net: fixup comments after "Future-proof tunnel offload handlers"
Sabrina Dubroca [Mon, 11 Jul 2016 11:12:28 +0000 (13:12 +0200)]
drivers/net: fixup comments after "Future-proof tunnel offload handlers"

Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMAINTAINERS: release Scott from being a rocker maintainer
Jiri Pirko [Sun, 10 Jul 2016 07:42:44 +0000 (09:42 +0200)]
MAINTAINERS: release Scott from being a rocker maintainer

As requested by Scott, removing him.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotunnels: correct conditional build of MPLS and IPv6
Simon Horman [Sun, 10 Jul 2016 01:20:11 +0000 (10:20 +0900)]
tunnels: correct conditional build of MPLS and IPv6

Using a combination if #if conditionals and goto labels to unwind
tunnel4_init seems unwieldy. This patch takes a simpler approach of
directly unregistering previously registered protocols when an error
occurs.

This fixes a number of problems with the current implementation
including the potential presence of labels when they are unused
and the potential absence of unregister code when it is needed.

Fixes: 8afe97e5d416 ("tunnels: support MPLS over IPv4 tunnels")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'sctp-rfc7496-support'
David S. Miller [Mon, 11 Jul 2016 20:25:39 +0000 (13:25 -0700)]
Merge branch 'sctp-rfc7496-support'

Xin Long says:

====================
sctp: implement rfc7496 in sctp

This patchset implements "Additional Policies for the Partially Reliable
Stream Control Transmission Protocol Extension" described on RFC7496.

The Partially Reliable SCTP (PR-SCTP) extension defined in [RFC3758]
provides a generic method for senders to abandon user messages. The
decision to abandon a user message is sender side only, and the exact
condition is called a "PR-SCTP policy". This patchset implements 3
policies:

 1. Timed Reliability:  This allows the sender to specify a timeout for
    a user message after which the SCTP stack abandons the user message.

 2. Limited Retransmission Policy:  Allows limitation of the number of
    retransmissions.

 3. Priority Policy:  Allows removal of lower-priority messages if space
    for higher-priority messages is needed in the send buffer.

Patch 1-3 add some sockopts in sctp to set/get pr_sctp policy status.
Patch 4-6 implement these 3 policies one by one.
====================

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: implement prsctp PRIO policy
Xin Long [Sat, 9 Jul 2016 11:47:45 +0000 (19:47 +0800)]
sctp: implement prsctp PRIO policy

prsctp PRIO policy is a policy to abandon lower priority chunks when
asoc doesn't have enough snd buffer, so that the current chunk with
higher priority can be queued successfully.

Similar to TTL/RTX policy, we will set the priority of the chunk to
prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
So if PRIO policy is enabled, msg->expire_at won't work.

asoc->sent_cnt_removable will record how many chunks can be checked to
remove. If priority policy is enabled, when the chunk is queued into
the out_queue, we will increase sent_cnt_removable. When the chunk is
moved to abandon_queue or dequeue and free, we will decrease
sent_cnt_removable.

In sctp_sendmsg, we will check if there is enough snd buffer for current
msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
free chunks from out_queue in right order until the abandon+free size >
msg_len - sctp_wfree. For the abandon size, we have to wait until it
sends FORWARD TSN, receives the sack and the chunks are really freed.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>