Kejian Yan [Wed, 9 Nov 2016 18:13:46 +0000 (18:13 +0000)]
net: hns: add fuzzy match of tcam table for hns
Since there is not enough tcam table entries for vlan and multicast
address, HNSv2 needs to add support of fuzzy matching of TCAM tables.
To add fuzzy match of TCAM, we Add the property to mask the bits to
be fuzzy matched
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:13:45 +0000 (18:13 +0000)]
Doc: hisi: hns adds mc-mac-mask property
Since there is not enough tcam table entries for every vlan and multicast
address, HNS needs to add support of fuzzy matching of TCAM tables. Adding
the property to mask the bits to be fuzzy matched, so update the bindings
document
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 9 Nov 2016 19:24:22 +0000 (11:24 -0800)]
tcp: remove unaligned accesses from tcp_get_info()
After commit
6ed46d1247a5 ("sock_diag: align nlattr properly when
needed"), tcp_get_info() gets 64bit aligned memory, so we can avoid
the unaligned helpers.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 03:15:28 +0000 (22:15 -0500)]
Merge tag 'batadv-next-for-davem-
20161108-v2' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
pull request for net-next: batman-adv 2016-11-08 v2
This feature and cleanup patchset includes the following changes:
- netlink and code cleanups by Sven Eckelmann (3 patches)
- Cleanup and minor fixes by Linus Luessing (3 patches)
- Speed up multicast update intervals, by Linus Luessing
- Avoid (re)broadcast in meshes for some easy cases,
by Linus Luessing
- Clean up tx return state handling, by Sven Eckelmann (6 patches)
- Fix some special mac address handling cases, by Sven Eckelmann
(3 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 02:20:01 +0000 (21:20 -0500)]
Merge branch 'PHC-freq-fine-tuning'
Richard Cochran says:
====================
PHC frequency fine tuning
This series expands the PTP Hardware Clock subsystem by adding a
method that passes the frequency tuning word to the the drivers
without dropping the low order bits. Keeping those bits is useful for
drivers whose frequency resolution is higher than 1 ppb.
The appended script (below) runs a simple demonstration of the
improvement. This test needs two Intel i210 PCIe cards installed in
the same PC, with their SDP0 pins connected by copper wire. Measuring
the estimated offset (from the ptp4l servo) and the true offset (from
the PPS) over one hour yields the following statistics.
| | Est. Before | Est. After | True Before | True After |
|--------+---------------+---------------+---------------+---------------|
| min | -5.
200000e+01 | -1.
600000e+01 | -3.
100000e+01 | -1.
000000e+00 |
| max | +5.
700000e+01 | +2.
500000e+01 | +8.
500000e+01 | +4.
000000e+01 |
| pk-pk: | +1.
090000e+02 | +4.
100000e+01 | +1.
160000e+02 | +4.
100000e+01 |
| mean | +6.
472222e-02 | +1.
277778e-02 | +2.
422083e+01 | +1.
826083e+01 |
| stddev | +1.
158006e+01 | +4.
581982e+00 | +1.
207708e+01 | +4.
981435e+00 |
Here the numbers in units of nanoseconds, and the ~20 nanosecond PPS
offset is due to input/output delays on the i210's external interface
logic.
With the series applied, both the peak to peak error and the standard
deviation improve by a factor of more than two. These two graphs show
the improvement nicely.
http://linuxptp.sourceforge.net/fine-tuning/fine-est.png
http://linuxptp.sourceforge.net/fine-tuning/fine-tru.png
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:18 +0000 (22:49 +0100)]
ptp: dp83640: Use the high resolution frequency method.
The dp83640 has a frequency resolution of about 0.029 ppb.
This patch lets users of the device benefit from the
increased frequency resolution when tuning the clock.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:17 +0000 (22:49 +0100)]
ptp: igb: Use the high resolution frequency method.
The 82580 and related devices offer a frequency resolution of about
0.029 ppb. This patch lets users of the device benefit from the
increased frequency resolution when tuning the clock.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:16 +0000 (22:49 +0100)]
ptp: Introduce a high resolution frequency adjustment method.
The internal PTP Hardware Clock (PHC) interface limits the resolution for
frequency adjustments to one part per billion. However, some hardware
devices allow finer adjustment, and making use of the increased resolution
improves synchronization measurably on such devices.
This patch adds an alternative method that allows finer frequency tuning
by passing the scaled ppm value to PHC drivers. This value comes from
user space, and it has a resolution of about 0.015 ppb. We also deprecate
the older method, anticipating its removal once existing drivers have been
converted over.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 8 Nov 2016 19:07:28 +0000 (11:07 -0800)]
net: napi_hash_add() is no longer exported
There are no more users except from net/core/dev.c
napi_hash_add() can now be static.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 8 Nov 2016 19:06:53 +0000 (11:06 -0800)]
bnxt_en: do not call napi_hash_add()
This is automatically done from netif_napi_add(), and we want to not
export napi_hash_add() anymore in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 8 Nov 2016 15:40:28 +0000 (16:40 +0100)]
bpf: Remove unused but set variables
Remove the unused but set variables min_set and max_set in
adjust_reg_min_max_vals to fix the following warning when building with
'W=1':
kernel/bpf/verifier.c:1483:7: warning: variable ‘min_set’ set but not used [-Wunused-but-set-variable]
There is no warning about max_set being unused, but since it is only
used in the assignment of min_set it can be removed as well.
They were introduced in commit
484611357c19 ("bpf: allow access into map
value arrays") but seem to have never been used.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 8 Nov 2016 15:24:03 +0000 (17:24 +0200)]
tc_act: Remove tcf_act macro
tc_act macro addressed a non existing field, and was not used in the
kernel source.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 01:40:13 +0000 (20:40 -0500)]
Merge branch 'ipv6-sr'
David Lebrun says:
====================
net: add support for IPv6 Segment Routing
v5:
- Check SRH validity when adding a new route with lwtunnels and
when setting an IPV6_RTHDR socket option.
- Check that hdr->segments_left is not out of bounds when processing
an SR-enabled packet.
- Add __ro_after_init attribute to seg6_genl_policy structure.
- Add CONFIG_IPV6_SEG6_INLINE option to enable or disable
direct header insertion.
v4:
- Change @cleanup in ipv6_srh_rcv() from int to bool
- Move checksum helper functions into header file
- Add common definition for SR TLVs
- Add comments for HMAC computation algorithm
- Use rhashtable to store HMAC infos instead of linked list
- Remove packed attribute for struct sr6_tlv_hmac
- Use dst cache only if CONFIG_DST_CACHE is enabled
v3:
- Fix compilation for CONFIG_IPV6={n,m}
v2:
- Remove packed attribute from sr6 struct and replaced unaligned
16-bit flags with two 8-bit flags.
- SR code now included by default. Option CONFIG_IPV6_SEG6_HMAC
exists for HMAC support (which requires crypto dependencies).
- Replace "hidden" calls to mutex_{un,}lock to direct calls.
- Fix reverse xmas tree coding style.
- Fix cast-from-void*'s.
- Update skb->csum to account for SR modifications.
- Add dst_cache in seg6_output.
Segment Routing (SR) is a source routing paradigm, architecturally
defined in draft-ietf-spring-segment-routing-09 [1]. The IPv6 flavor of
SR is defined in draft-ietf-6man-segment-routing-header-02 [2].
The main idea is that an SR-enabled packet contains a list of segments,
which represent mandatory waypoints. Each waypoint is called a segment
endpoint. The SR-enabled packet is routed normally (e.g. shortest path)
between the segment endpoints. A node that inserts an SRH into a packet
is called an ingress node, and a node that is the last segment endpoint
is called an egress node.
From an IPv6 viewpoint, an SR-enabled packet contains an IPv6 extension
header, which is a Routing Header type 4, defined as follows:
struct ipv6_sr_hdr {
__u8 nexthdr;
__u8 hdrlen;
__u8 type;
__u8 segments_left;
__u8 first_segment;
__u8 flag_1;
__u8 flag_2;
__u8 reserved;
struct in6_addr segments[0];
};
The first 4 bytes of the SRH is consistent with the Routing Header
definition in RFC 2460. The type is set to `4' (SRH).
Each segment is encoded as an IPv6 address. The segments are encoded in
reverse order: segments[0] is the last segment of the path, and
segments[first_segment] is the first segment of the path.
segments[segments_left] points to the currently active segment and
segments_left is decremented at each segment endpoint.
There exist two ways for a packet to receive an SRH, we call them
encap mode and inline mode. In the encap mode, the packet is encapsulated
in an outer IPv6 header that contains the SRH. The inner (original) packet
is not modified. A virtual tunnel is thus created between the ingress node
(the node that encapsulates) and the egress node (the last segment of the path).
Once an encapsulated SR packet reaches the egress node, the node decapsulates
the packet and performs a routing decision on the inner packet. This kind of
SRH insertion is intended to use for routers that encapsulates in-transit
packet.
The second SRH insertion method, the inline mode, acts by directly inserting
the SRH right after the IPv6 header of the original packet. For this method,
if a particular flag (SR6_FLAG_CLEANUP) is set, then the penultimate segment
endpoint must strip the SRH from the packet before forwarding it to the last
segment endpoint. This insertion method is intended to use for endhosts,
however it is also used for in-transit packets by some industry actors.
Note that directly inserting extension headers may break several mechanisms
such as Path MTU Discovery, IPSec AH, etc. For this reason, this insertion
method is only available if CONFIG_IPV6_SEG6_INLINE is enabled.
Finally, the SRH may contain TLVs after the segments list. Several types of
TLVs are defined, but we currently consider only the HMAC TLV. This TLV is
an answer to the deprecation of the RH0 and enables to ensure the authenticity
and integrity of the SRH. The HMAC text contains the flags, the first_segment
index, the full list of segments, and the source address of the packet. While
SR is intended to use mostly within a single administrative domain, the HMAC
TLV allows to verify SR packets coming from an untrusted source.
This patches series implements support for the IPv6 flavor of SR and is
logically divided into the following components:
(1) Data plane support (patch 01). This patch adds a function
in net/ipv6/exthdrs.c to handle the Routing Header type 4.
It enables the kernel to act as a segment endpoint, by supporting
the following operations: decrementation of the segments_left field,
cleanup flag support (removal of the SRH if we are the penultimate
segment endpoint) and decapsulation of the inner packet as an egress
node.
(2) Control plane support (patches 02..03 and 07..09). These patches enables
to insert SRH on locally emitted and/or forwarded packets, both with
encap mode and with inline mode. The SRH insertion is controlled through
the lightweight tunnels mechanism. Furthermore, patch 08 enables the
applications to insert an SRH on a per-socket basis, through the
setsockopt() system call. The mechanism to specify a per-socket
Routing Header was already defined for RH0 and no special modification
was performed on this side. However, the code to actually push the RH
onto the packets had to be adapted for the SRH specifications.
(3) HMAC support (patches 04..06). These patches adds the support of the
HMAC TLV verification for the dataplane part, and generation for
the control plane part. Two hashing algorithms are supported
(SHA-1 as legacy and SHA-256 as required by the IETF draft), but
additional algorithms can be easily supported by simply adding an
entry into an array.
[1] https://tools.ietf.org/html/draft-ietf-spring-segment-routing-09
[2] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:22 +0000 (14:59 +0100)]
ipv6: sr: add documentation file for per-interface sysctls
This patch adds documentation for some SR-related per-interface
sysctls.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:21 +0000 (14:59 +0100)]
ipv6: sr: add support for SRH injection through setsockopt
This patch adds support for per-socket SRH injection with the setsockopt
system call through the IPPROTO_IPV6, IPV6_RTHDR options.
The SRH is pushed through the ipv6_push_nfrag_opts function.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:20 +0000 (14:59 +0100)]
ipv6: add source address argument for ipv6_push_nfrag_opts
This patch prepares for insertion of SRH through setsockopt().
The new source address argument is used when an HMAC field is
present in the SRH, which must be filled. The HMAC signature
process requires the source address as input text.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:19 +0000 (14:59 +0100)]
ipv6: sr: add calls to verify and insert HMAC signatures
This patch enables the verification of the HMAC signature for transiting
SR-enabled packets, and its insertion on encapsulated/injected SRH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:18 +0000 (14:59 +0100)]
ipv6: sr: implement API to control SR HMAC structure
This patch provides an implementation of the genetlink commands
to associate a given HMAC key identifier with an hashing algorithm
and a secret.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:42 +0000 (14:57 +0100)]
ipv6: sr: add core files for SR HMAC support
This patch adds the necessary functions to compute and check the HMAC signature
of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and
hmac(sha256).
In order to avoid dynamic memory allocation for each HMAC computation,
a per-cpu ring buffer is allocated for this purpose.
A new per-interface sysctl called seg6_require_hmac is added, allowing a
user-defined policy for processing HMAC-signed SR-enabled packets.
A value of -1 means that the HMAC field will always be ignored.
A value of 0 means that if an HMAC field is present, its validity will
be enforced (the packet is dropped is the signature is incorrect).
Finally, a value of 1 means that any SR-enabled packet that does not
contain an HMAC signature or whose signature is incorrect will be dropped.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:41 +0000 (14:57 +0100)]
ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
This patch creates a new type of interfaceless lightweight tunnel (SEG6),
enabling the encapsulation and injection of SRH within locally emitted
packets and forwarded packets.
>From a configuration viewpoint, a seg6 tunnel would be configured as follows:
ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0
Any packet whose destination address is fc00::1 would thus be encapsulated
within an outer IPv6 header containing the SRH with three segments, and would
actually be routed to the first segment of the list. If `mode inline' was
specified instead of `mode encap', then the SRH would be directly inserted
after the IPv6 header without outer encapsulation.
The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This
feature was made configurable because direct header insertion may break
several mechanisms such as PMTUD or IPSec AH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:40 +0000 (14:57 +0100)]
ipv6: sr: add code base for control plane support of SR-IPv6
This patch adds the necessary hooks and structures to provide support
for SR-IPv6 control plane, essentially the Generic Netlink commands
that will be used for userspace control over the Segment Routing
kernel structures.
The genetlink commands provide control over two different structures:
tunnel source and HMAC data. The tunnel source is the source address
that will be used by default when encapsulating packets into an
outer IPv6 header + SRH. If the tunnel source is set to :: then an
address of the outgoing interface will be selected as the source.
The HMAC commands currently just return ENOTSUPP and will be implemented
in a future patch.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:39 +0000 (14:57 +0100)]
ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)
Implement minimal support for processing of SR-enabled packets
as described in
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.
This patch implements the following operations:
- Intermediate segment endpoint: incrementation of active segment and rerouting.
- Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
and routing of inner packet.
- Cleanup flag support for SR-inlined packets: removal of SRH if we are the
penultimate segment endpoint.
A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
packets. Default is deny.
This patch does not provide support for HMAC-signed packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 8 Nov 2016 13:31:38 +0000 (14:31 +0100)]
net: mii: report 0 for unknown lp_advertising
The newly introduced mii_ethtool_get_link_ksettings function sets
lp_advertising to an uninitialized value when BMCR_ANENABLE is not
set:
drivers/net/mii.c: In function 'mii_ethtool_get_link_ksettings':
drivers/net/mii.c:224:2: error: 'lp_advertising' may be used uninitialized in this function [-Werror=maybe-uninitialized]
As documented in include/uapi/linux/ethtool.h, the value is
expected to be zero when we don't know it, so let's initialize
it to that.
Fixes:
bc8ee596afe8 ("net: mii: add generic function to support ksetting support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Beulich [Tue, 8 Nov 2016 07:45:53 +0000 (00:45 -0700)]
xen-netback: prefer xenbus_scanf() over xenbus_gather()
For single items being collected this should be preferred as being more
typesafe (as the compiler can check format string and to-be-written-to
variable match) and more efficient (requiring one less parameter to be
passed).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Mon, 7 Nov 2016 06:51:23 +0000 (14:51 +0800)]
igmp: Document sysctl force_igmp_version
There is some difference between force_igmp_version and force_mld_version.
Add document to make users aware of this.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asbjørn Sloth Tønnesen [Mon, 7 Nov 2016 20:39:28 +0000 (20:39 +0000)]
net: l2tp: fix negative assignment to unsigned int
recv_seq, send_seq and lns_mode mode are all defined as
unsigned int foo:1;
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asbjørn Sloth Tønnesen [Mon, 7 Nov 2016 20:39:27 +0000 (20:39 +0000)]
net: l2tp: cleanup: remove redundant condition
These assignments follow this pattern:
unsigned int foo:1;
struct nlattr *nla = info->attrs[bar];
if (nla)
foo = nla_get_flag(nla); /* expands to: foo = !!nla */
This could be simplified to: if (nla) foo = 1;
but lets just remove the condition and use the macro,
foo = nla_get_flag(nla);
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asbjørn Sloth Tønnesen [Mon, 7 Nov 2016 20:39:26 +0000 (20:39 +0000)]
net: l2tp: netlink: l2tp_nl_tunnel_send: set UDP6 checksum flags
This patch causes the proper attribute flags to be set,
in the case that IPv6 UDP checksums are disabled, so that
userspace ie. `ip l2tp show tunnel` knows about it.
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asbjørn Sloth Tønnesen [Mon, 7 Nov 2016 20:39:25 +0000 (20:39 +0000)]
net: l2tp: only set L2TP_ATTR_UDP_CSUM if AF_INET
Only set L2TP_ATTR_UDP_CSUM in l2tp_nl_tunnel_send()
when it's running over IPv4.
This prepares the code to also have IPv6 specific attributes.
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asbjørn Sloth Tønnesen [Mon, 7 Nov 2016 20:39:24 +0000 (20:39 +0000)]
net: l2tp: change L2TP_ATTR_UDP_ZERO_CSUM6_{RX, TX} attribute types
The attributes L2TP_ATTR_UDP_ZERO_CSUM6_RX and
L2TP_ATTR_UDP_ZERO_CSUM6_TX are used as flags,
but is defined as a u8 in a comment.
This patch redocuments them as flags.
Adding nla_policy entries would break API, so not doing that.
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 7 Nov 2016 19:12:27 +0000 (11:12 -0800)]
net-gro: avoid reorders
Receiving a GSO packet in dev_gro_receive() is not uncommon
in stacked devices, or devices partially implementing LRO/GRO
like bnx2x. GRO is implementing the aggregation the device
was not able to do itself.
Current code causes reorders, like in following case :
For a given flow where sender sent 3 packets P1,P2,P3,P4
Receiver might receive P1 as a single packet, stored in GRO engine.
Then P2-P4 are received as a single GSO packet, immediately given to
upper stack, while P1 is held in GRO engine.
This patch will make sure P1 is given to upper stack, then P2-P4
immediately after.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 18:59:17 +0000 (13:59 -0500)]
Merge branch 'sfc-udp-rss'
Edward Cree says:
====================
sfc: enable 4-tuple UDP RSS hashing
EF10 based NICs have configurable RSS hash fields, and can be made to take the
ports into the hash on UDP (they already do so for TCP). This patch series
enables this, in order to improve spreading of UDP traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 3 Nov 2016 22:12:58 +0000 (22:12 +0000)]
sfc: report 4-tuple UDP hashing to ethtool, if it's enabled
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 3 Nov 2016 22:12:27 +0000 (22:12 +0000)]
sfc: enable 4-tuple RSS hashing for UDP
This improves UDP spreading, and also slightly improves GRO performance
of encapsulated TCP on 7000 series NICs.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 18:41:57 +0000 (13:41 -0500)]
Merge branch 'mlx5-SRIOV-offload-tunnel_key-set-release'
Saeed Mahameed says:
====================
Mellanox 100G SRIOV offloads tunnel_key set/release
From Hadar Hen Zion:
This series further enhances the SRIOV TC offloads of mlx5 to handle the
TC tunnel_key release and set actions.
This serves a common use-case in virtualization systems where the virtual
switch encapsulate packets (tunnel_key set action) sent from VMs with
outer headers corresponding to the local/remote host IPs and de-capsulate
(tunnel_key release) outer headers before the packets are received by the
VM.
We use the new E-Switch switchdev mode and TC tunnel_key set/release
action to achieve that also in SW defined SRIOV environments by
offloading TC rules that contain these actions along with forwarding
(TC mirred/redirect action) the packets.
The first six patches are adding the needed support in flow dissector,
flower and tc for offloading tunnel_key actions:
- The first three patches are adding the needed help functions
and enums
- The next three patches in the series are adding UDP port attribute
to tunnel_key release and set actions.
The addition of UDP ports would allow the HW driver to make sure they are
given (say) a VXLAN tunnel to offload (mlx5e uses that).
Patches 7-10 are mlx5 preparations for tunnel_key actions offloads support.
Patch #11 adds mlx5e support to offload tunnel_key release action, and the
last two patches (#12-13) add mlx5e support to tc tunnel_key set action.
Currently in order to offload tc tunnel_key release action, the tc rule
should be placed on top of the mlx5e offloading (uplink) interface instead
of the shared tunnel interface. The resolution between the tunnel interface
to the HW netdevice will be implemented in a follow up series.
This series was generated against commit
94edc86bf13f ("Merge branch 'dwmac-sti-refactor-cleanup'")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:48 +0000 (15:14 +0200)]
net/mlx5e: Add basic TC tunnel set action for SRIOV offloads
In mlx5 HW, encapsulation is offloaded by the steering rule having
index into an encapsulation table containing the entire set of headers
to be added by the HW. The driver sets these headers in a buffer when we
are offloading the action.
The code maintains mlx5_encap_entry for each encap header it has
encountered when attempted to offload TC tunnel set action.
This entry maintains a linked list of all the flows sharing the same
encap header, when the last flow is removed from the list the encap
entry is removed.
The actual encap_header is allocated by the driver in the hardware only
if we have layer two neighbour info when the encap entry is created.
While the flow is in the driver, the driver holds a reference on the
neighbour.
When a new flow with encap action is inserted, the code first checks if
the required encap entry exists according to the tunnel set parameters.
If it does the encap is shared, otherwise a new mlx5_encap_entry is
created.
TC action parsing implementation in the driver assumes that tunnel set
action is provided in the same order set by the user, e.g before the
mirred_redirect action.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:47 +0000 (15:14 +0200)]
net/mlx5e: Add ndo_udp_tunnel_add to VF representors
By implementing this ndo, the host stack will set the vxlan udp port
also to VF representor netdevices. This will allow the TC offload code
in the driver when it gets a tunnel key set action to identify the UDP
port as vxlan, and hence the rule will be a candidate for offloading.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:46 +0000 (15:14 +0200)]
net/mlx5e: Add TC tunnel release action for SRIOV offloads
Enhance the parsing of offloaded TC rules to set HW matching on outer
(encapsulation) headers.
Parse TC tunnel release action and set it as mlx5 decap action when the
required capabilities are supported.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:45 +0000 (15:14 +0200)]
net/mlx5: Support encap id when setting new steering entry
In order to support steering rules which add encapsulation headers,
encap_id parameter is needed.
Add new mlx5_flow_act struct which holds action related parameter:
action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new
steering rule.
This patch doesn't change any functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:44 +0000 (15:14 +0200)]
net/mlx5: Add creation flags when adding new flow table
When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:43 +0000 (15:14 +0200)]
net/mlx5: Check max encap header size capability
Instead of comparing to a const value, check the value of max encap
header size capability as reported by the Firmware.
Fixes:
575ddf5888ea ('net/mlx5: Introduce alloc_encap and dealloc_encap commands')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:42 +0000 (15:14 +0200)]
net/mlx5: Move alloc/dealloc encap commands declarations to common header file
The alloc and dealloc encap commands will be used in the mlx5e driver,
as such, declare them in a common header file.
Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with
mlx5_encap_{de}alloc.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:41 +0000 (15:14 +0200)]
net/sched: act_tunnel_key: Add UDP dst port option
The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:40 +0000 (15:14 +0200)]
net/dst: Add dst port to dst_metadata utility functions
Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst
utility functions.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:39 +0000 (15:14 +0200)]
net/sched: cls_flower: Add UDP port to tunnel parameters
The current IP tunneling classification supports only IP addresses and key.
Enhance UDP based IP tunneling classification parameters by adding UDP
src and dst port.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:38 +0000 (15:14 +0200)]
net/sched: cls_flower: Allow setting encapsulation fields as used key
When encapsulation field is set, mark it as used key for the flow
dissector. This will be used by offloading drivers.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:37 +0000 (15:14 +0200)]
flow_dissector: Add enums for encapsulation keys
New encapsulation keys were added to the flower classifier, which allow
classification according to outer (encapsulation) headers attributes
such as key and IP addresses.
In order to expose those attributes outside flower, add
corresponding enums in the flow dissector.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:36 +0000 (15:14 +0200)]
net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key
Needed for drivers to pick the relevant action when offloading tunnel
key act.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Sun, 6 Nov 2016 15:16:25 +0000 (00:16 +0900)]
net: core: add missing check for uid_range in rule_exists.
Without this check, it is not possible to create two rules that
are identical except for their UID ranges. For example:
root@net-test:/# ip rule add prio 1000 lookup 300
root@net-test:/# ip rule add prio 1000 uidrange 100-200 lookup 300
RTNETLINK answers: File exists
root@net-test:/# ip rule add prio 1000 uidrange 100-199 lookup 100
root@net-test:/# ip rule add prio 1000 uidrange 200-299 lookup 200
root@net-test:/# ip rule add prio 1000 uidrange 300-399 lookup 100
RTNETLINK answers: File exists
Tested: https://android-review.googlesource.com/#/c/299980/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 6 Nov 2016 15:12:27 +0000 (17:12 +0200)]
qed: Prevent stack corruption on MFW interaction
Driver uses a union for copying data to & from management firmware
when interacting with it.
Problem is that the function always copies sizeof(union) while commit
2edbff8dcb5d ("qed: Learn resources from management firmware") is casting
a union elements which is of smaller size [24-byte instead of 88-bytes].
Also, the union contains some inappropriate elements which increase its
size [should have been 32-bytes]. While this shouldn't corrupt other
PF messages to the MFW [as management firmware enforces permissions so
that each PF is allowed to write only to its own mailbox] we fix this
here as well.
Fixes:
2edbff8dcb5d ("qed: Learn resources from management firmware")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 6 Nov 2016 14:02:32 +0000 (15:02 +0100)]
net: 3com: typhoon: fix typhoon_get_link_ksettings
When moving from typhoon_get_settings to typhoon_getlink_ksettings
in the commit
f7a5537cd2a5 ("net: 3com: typhoon: use new api
ethtool_{get|set}_link_ksettings"), we use a local variable supported
but we forgot to update the struct ethtool_link_ksettings with
this value.
We also initialize advertising to zero, because otherwise it may
be uninitialized if no case of the switch (tp->xcvr_select) is used.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 6 Nov 2016 13:57:04 +0000 (14:57 +0100)]
net: xgbe: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 5 Nov 2016 23:26:41 +0000 (00:26 +0100)]
net: amd: pcnet32: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 5 Nov 2016 19:17:03 +0000 (20:17 +0100)]
net: amd8111e: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 5 Nov 2016 15:17:54 +0000 (16:17 +0100)]
net: alteon: acenic: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 5 Nov 2016 13:05:39 +0000 (14:05 +0100)]
net: adaptec: starfire: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 18:21:25 +0000 (13:21 -0500)]
Merge branch 'stmmac-dwmac-rk-PM'
Joachim Eastwood says:
====================
stmmac: dwmac-rk: convert to standard PM/remove functions
This patch set aims to remove the init/exit callbacks from the
dwmac-rk driver and instead use standard PM callbacks. Eventually
the init/exit callbacks will be deprecated and removed from all
drivers dwmac-* except for dwmac-generic. Drivers will be refactored
to use standard PM and remove callbacks.
This conversion was pretty straight forward, but it would really nice
if some chromium people could test suspend/resume with this patch set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Sat, 5 Nov 2016 13:04:52 +0000 (14:04 +0100)]
Revert "net: stmmac: allow to split suspend/resume from init/exit callbacks"
Instead of adding hooks inside stmmac_platform it is better to just use
the standard PM callbacks within the specific dwmac-driver. This only
used by the dwmac-rk driver.
This reverts commit
cecbc5563a02 ("stmmac: allow to split suspend/resume
from init/exit callbacks").
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Sat, 5 Nov 2016 13:04:51 +0000 (14:04 +0100)]
stmmac: dwmac-rk: absorb rk_gmac_init into probe
Since the rk_gmac_init() only calls another function move this
function call into probe so rk_gmac_init() can be removed.
Since commit
cecbc5563a02 ("stmmac: allow to split suspend/resume
from init/exit callbacks") the init hook is no longer used in
dwmac-rk so this can be removed.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Sat, 5 Nov 2016 13:04:50 +0000 (14:04 +0100)]
stmmac: dwmac-rk: turn exit into standard driver remove callback
Convert the exit hook into a standard driver remove function as
the hook doesn't really buy us anything extra.
Eventually the exit hook will be deprecated in favor of the driver
remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Sat, 5 Nov 2016 13:04:49 +0000 (14:04 +0100)]
stmmac: dwmac-rk: turn resume/suspend into standard PM callbacks
Use standard PM resume/suspend callbacks instead of the hooks in
stmmac_platform. This gives the driver more control and flexibility
when implementing PM functionality. The hooks in stmmac_platform
also doesn't buy us anything extra.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 18:02:28 +0000 (13:02 -0500)]
Merge branch 'tcp_get_info-locking'
Eric Dumazet says:
====================
tcp: tcp_get_info() locking changes
This short series prepares tcp_get_info() for more detailed infos.
In order to not slow down fast path, our goal is to use the normal
socket spinlock instead of custom synchronization.
All we need to ensure is that tcp_get_info() is not called with
ehash lock, which might dead lock, since packet processing would acquire
the spinlocks in reverse way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Nov 2016 18:54:32 +0000 (11:54 -0700)]
tcp: no longer hold ehash lock while calling tcp_get_info()
We had various problems in the past in tcp_get_info() and used
specific synchronization to avoid deadlocks.
We would like to add more instrumentation points for TCP, and
avoiding grabing socket lock in tcp_getinfo() was too costly.
Being able to lock the socket allows to provide consistent set
of fields.
inet_diag_dump_icsk() can make sure ehash locks are not
held any more when tcp_get_info() is called.
We can remove syncp added in commit
d654976cbf85
("tcp: fix a potential deadlock in tcp_get_info()"), but we need
to use lock_sock_fast() instead of spin_lock_bh() since TCP input
path can now be run from process context.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Nov 2016 18:54:31 +0000 (11:54 -0700)]
tcp: shortcut listeners in tcp_get_info()
Being lockless in tcp_get_info() is hard, because we need to add
specific synchronization in TCP fast path, like seqcount.
Following patch will change inet_diag_dump_icsk() to no longer
hold any lock for non listeners, so that we can properly acquire
socket lock in get_tcp_info() and let it return more consistent counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 17:50:56 +0000 (12:50 -0500)]
Merge branch 'Meson-GXL-internal-phy'
Neil Armstrong says:
====================
ARM64: Add Internal PHY support for Meson GXL
The Amlogic Meson GXL SoCs have an internal RMII PHY that is muxed with the
external RGMII pins.
In order to support switching between the two PHYs links, extended registers
size for mdio-mux-mmioreg must be added.
The DT related patches submitted as RFC in [3] will be sent in a separate
patchset due to multiple patchsets and DTSI migrations.
Changes since v2 RFC patchset at : [3]
- Change phy Kconfig/Makefile alphabetic order
- GXL dtsi cleanup
Changes since original RFC patchset at : [2]
- Remove meson8b experimental phy switching
- Switch to mdio-mux-mmioreg with extennded size support
- Add internal phy support for S905x and p231
- Add external PHY support for p230
[1] http://lkml.kernel.org/r/
1477932286-27482-1-git-send-email-narmstrong@baylibre.com
[2] http://lkml.kernel.org/r/
1477060838-14164-1-git-send-email-narmstrong@baylibre.com
[3] http://lkml.kernel.org/r/
1477932987-27871-1-git-send-email-narmstrong@baylibre.com
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Fri, 4 Nov 2016 15:51:23 +0000 (16:51 +0100)]
net: phy: Add Meson GXL Internal PHY driver
Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs.
This PHY seems to only implement some standard registers and need some
workarounds to provide autoneg values from vendor registers.
Some magic values are currently used to configure the PHY, and this a
temporary setup until clarification about these registers names and
registers fields are provided by Amlogic.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Fri, 4 Nov 2016 15:51:22 +0000 (16:51 +0100)]
net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes
In order to support PHY switching on Amlogic GXL SoCs, add support for
16bit and 32bit registers sizes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 17:47:50 +0000 (12:47 -0500)]
Merge branch 'rds-tcp-fixes'
Sowmini Varadhan says:
====================
RDS: TCP: bug fixes
A couple of bug fixes identified during testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Fri, 4 Nov 2016 17:04:12 +0000 (10:04 -0700)]
RDS: TCP: start multipath acceptor loop at 0
The for() loop in rds_tcp_accept_one() assumes that the 0'th
rds_tcp_conn_path is UP and starts multipath accepts at index 1.
But this assumption may not always be true: if the 0'th path
has failed (ERROR or DOWN state) an incoming connection request
should be used to resurrect this path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Fri, 4 Nov 2016 17:04:11 +0000 (10:04 -0700)]
RDS: TCP: report addr/port info based on TCP socket in rds-info
The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
so it is incorrect to report the address port info based on
rds_getname() as part of TCP state report.
Invoke inet_getname() for the t_sock associated with the
rds_tcp_connection instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sat, 6 Aug 2016 15:04:23 +0000 (17:04 +0200)]
batman-adv: Reject unicast packet with zero/mcast dst address
An unicast batman-adv packet cannot be transmitted to a multicast or zero
mac address. So reject incoming packets which still have these classes of
addresses as destination mac address in the outer ethernet header.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sat, 6 Aug 2016 15:04:22 +0000 (17:04 +0200)]
batman-adv: Disallow zero and mcast src address for mgmt frames
The routing check for management frames is validating the source mac
address in the outer ethernet header. It rejects every source mac address
which is a broadcast address. But it also has to reject the zero-mac
address and multicast mac addresses.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sat, 6 Aug 2016 15:04:21 +0000 (17:04 +0200)]
batman-adv: Disallow mcast src address for data frames
The routing checks are validating the source mac address of the outer
ethernet header. They reject every source mac address which is a broadcast
address. But they also have to reject any multicast mac addresses.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
[sw@simonwunderlich.de: fix commit message typo]
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sun, 17 Jul 2016 19:04:05 +0000 (21:04 +0200)]
batman-adv: Remove dev_queue_xmit return code exception
No caller of batadv_send_skb_to_orig is expecting the results to be -1
(-EPERM) anymore when the skbuff was not consumed. They will instead expect
that the skbuff is always consumed. Having such return code filter is
therefore not needed anymore.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sun, 17 Jul 2016 19:04:04 +0000 (21:04 +0200)]
batman-adv: Consume skb in receive handlers
Receiving functions in Linux consume the supplied skbuff. Doing the same in
the batadv_rx_handler functions makes the behavior more similar to the rest
of the Linux network code.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Soheil Hassas Yeganeh [Thu, 3 Nov 2016 22:24:27 +0000 (18:24 -0400)]
sock: do not set sk_err in sock_dequeue_err_skb
Do not set sk_err when dequeuing errors from the error queue.
Doing so results in:
a) Bugs: By overwriting existing sk_err values, it possibly
hides legitimate errors. It is also incorrect when local
errors are queued with ip_local_error. That happens in the
context of a system call, which already returns the error
code.
b) Inconsistent behavior: When there are pending errors on
the error queue, sk_err is sometimes 0 (e.g., for
the first timestamp on the error queue) and sometimes
set to an error code (after dequeuing the first
timestamp).
c) Suboptimality: Setting sk_err to ENOMSG on simple
TX timestamps can abort parallel reads and writes.
Removing this line doesn't break userspace. This is because
userspace code cannot rely on sk_err for detecting whether
there is something on the error queue. Except for ICMP messages
received for UDP and RAW, sk_err is not set at enqueue time,
and as a result sk_err can be 0 while there are plenty of
errors on the error queue.
For ICMP packets in UDP and RAW, sk_err is set when they are
enqueued on the error queue, but that does not result in aborting
reads and writes. For such cases, sk_err is only readable via
getsockopt(SO_ERROR) which will reset the value of sk_err on
its own. More importantly, prior to this patch,
recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
sk_err is set by sock_dequeue_err_skb without atomic ops or
locks) which can store 0 in sk_err even when we have ICMP
messages pending. Removing this line from sock_dequeue_err_skb
eliminates that race.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Nov 2016 01:15:56 +0000 (20:15 -0500)]
Merge branch 'IFF_NO_QUEUE-semantics'
Jesper Dangaard Brouer says:
====================
qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
This patchset is a cleanup for IFF_NO_QUEUE devices. It will
hopefully help userspace get a more consistent behavior when attaching
qdisc to such virtual devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Thu, 3 Nov 2016 13:56:11 +0000 (14:56 +0100)]
qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.
Why should the kernel catch such a misconfiguration? Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device. The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).
This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device. The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc. The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].
[1] https://github.com/docker/libcontainer/pull/193
[2] https://github.com/openshift/origin/pull/11126
Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's
catch the misconfiguration on the kernel side.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Thu, 3 Nov 2016 13:56:06 +0000 (14:56 +0100)]
net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
default qdisc attached, given they will be backed by physical device,
that already have a qdisc attached for pushback.
It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
this can be useful for difference policy reasons (e.g. bandwidth
limiting containers). For this to work, the tx_queue_len need to have
a sane value, because some qdiscs inherit/copy the tx_queue_len
(namely, pfifo, bfifo, gred, htb, plug and sfb).
Commit
a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
ether_setup()") caught situations where some drivers didn't initialize
tx_queue_len. The problem with the commit was choosing 1 as the
fallback value.
A qdisc queue length of 1 causes more harm than good, because it
creates hard to debug situations for userspace. It gives userspace a
false sense of a working config after attaching a qdisc. As low
volume traffic (that doesn't activate the qdisc policy) works,
like ping, while traffic that e.g. needs shaping cannot reach the
configured policy levels, given the queue length is too small.
This patch change the value to DEFAULT_TX_QUEUE_LEN, given other
IFF_NO_QUEUE devices (that call ether_setup()) also use this value.
Fixes:
a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Thu, 3 Nov 2016 13:56:01 +0000 (14:56 +0100)]
net: make default TX queue length a defined constant
The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.
Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted. This
value later caused much heartache for the bufferbloat community.
This patch merely moves the value into a defined constant.
[1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/
[2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/
[3] https://git.kernel.org/tglx/history/c/
98921832c232
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 18:24:42 +0000 (13:24 -0500)]
Merge branch 'udp-fwd-mem-sched-on-dequeue'
Paolo Abeni says:
====================
udp: do fwd memory scheduling on dequeue
After commit
850cbaddb52d ("udp: use it's own memory accounting schema"),
the udp code needs to acquire twice the receive queue spinlock on dequeue.
This patch series remove the need for the second lock at skb free time,
moving the udp memory scheduling inside the dequeue operation; the skb
destructor field is not used anymore and an additional sk argument is added
to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue.
Many thanks to Eric Dumazed for suggesting pretty all much the above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:59 +0000 (11:28 +0100)]
udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.
Overall, this allows acquiring only once the receive queue
lock on dequeue.
Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.
nr sinks vanilla patched
1 440 560
3 2150 2300
6 3650 3800
9 4450 4600
12 6250 6450
v1 -> v2:
- do rmem and allocated memory scheduling under the receive lock
- do bulk scheduling in first_packet_length() and in udp_destruct_sock()
- avoid the typdef for the dequeue callback
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:58 +0000 (11:28 +0100)]
net/sock: add an explicit sk argument for ip_cmsg_recv_offset()
So that we can use it even after orphaining the skbuff.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 3 Nov 2016 16:25:00 +0000 (09:25 -0700)]
net: Update raw socket bind to consider l3 domain
Binding a raw socket to a local address fails if the socket is bound
to an L3 domain:
$ vrf-test -s -l 10.100.1.2 -R -I red
error binding socket: 99: Cannot assign requested address
Update raw_bind to look consider if sk_bound_dev_if is bound to an L3
domain and use inet_addr_type_table to lookup the address.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 18:11:23 +0000 (13:11 -0500)]
Merge branch 'ns2-amac'
Jon Mason says:
====================
add NS2 support to bgmac
Changes in v6:
* Use a common bgmac_phy_connect_direct (per Rafal Milecki)
* Rebased on latest net-next
* Added Reviewed-by to the relevant patches
Changes in v5:
* Change a pr_err to netdev_err (per Scott Branden)
* Reword the lane swap binding documentation (per Andrew Lunn)
Changes in v4:
* Actually send out the lane swap binding doc patch (Per Scott Branden)
* Remove unused #define (Per Andrew Lunn)
Changes in v3:
* Clean-up the bgmac DT binding doc (per Rob Herring)
* Document the lane swap binding and make it generic (Per Andrew Lunn)
Changes in v2:
* Remove the PHY power-on (per Andrew Lunn)
* Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
This results on none of the original PHY code from Vikas being
present. So, I'm removing him as an author and giving him
"Inspired-by" credit.
* Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
Fainelli)
* Remove bgmac sleep (per Florian Fainelli)
* Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
* Rebased on latest net-next
* Added patch for bcm54xx_auxctl_read, which is used in the BCM54810
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:11:02 +0000 (01:11 -0400)]
arm64: dts: NS2: add AMAC ethernet support
Add support for the AMAC ethernet to the Broadcom Northstar2 SoC device
tree
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:11:01 +0000 (01:11 -0400)]
net: ethernet: bgmac: add NS2 support
Add support for the variant of amac hardware present in the Broadcom
Northstar2 based SoCs. Northstar2 requires an additional register to be
configured with the port speed/duplexity (NICPM). This can be added to
the link callback to hide it from the instances that do not use this.
Also, clearing of the pending interrupts on init is required due to
observed issues on some platforms.
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:11:00 +0000 (01:11 -0400)]
net: ethernet: bgmac: device tree phy enablement
Change the bgmac driver to allow for phy's defined by the device tree
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:59 +0000 (01:10 -0400)]
Documentation: devicetree: net: add NS2 bindings to amac
Clean-up the documentation to the bgmac-amac driver, per suggestion by
Rob Herring, and add details for NS2 support.
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:58 +0000 (01:10 -0400)]
net: phy: broadcom: Add BCM54810 PHY entry
The BCM54810 PHY requires some semi-unique configuration, which results
in some additional configuration in addition to the standard config.
Also, some users of the BCM54810 require the PHY lanes to be swapped.
Since there is no way to detect this, add a device tree query to see if
it is applicable.
Inspired-by: Vikas Soni <vsoni@broadcom.com>
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:57 +0000 (01:10 -0400)]
Documentation: devicetree: add PHY lane swap binding
Add the documentation for PHY lane swapping. This is a boolean entry to
notify the phy device drivers that the TX/RX lanes need to be swapped.
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:56 +0000 (01:10 -0400)]
net: phy: broadcom: add bcm54xx_auxctl_read
Add a helper function to read the AUXCTL register for the BCM54xx. This
mirrors the bcm54xx_auxctl_write function already present in the code.
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 03:00:17 +0000 (22:00 -0500)]
Merge branch 'dwmac-sti-refactor-cleanup'
Joachim Eastwood says:
====================
stmmac: dwmac-sti refactor+cleanup
This patch set aims to remove the init/exit callbacks from the
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.
Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:11 +0000 (18:54 +0100)]
stmmac: dwmac-sti: remove unused priv dev member
The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:10 +0000 (18:54 +0100)]
stmmac: dwmac-sti: clean up and rename sti_dwmac_init
Rename sti_dwmac_init to sti_dwmac_set_mode which is a better
description for what it really does.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:09 +0000 (18:54 +0100)]
stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:08 +0000 (18:54 +0100)]
stmmac: dwmac-sti: move st, gmac_en parsing to sti_dwmac_parse_data
The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:07 +0000 (18:54 +0100)]
stmmac: dwmac-sti: add PM ops and resume function
Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.
Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:06 +0000 (18:54 +0100)]
stmmac: dwmac-sti: remove clk NULL checks
Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 4 Nov 2016 17:54:05 +0000 (18:54 +0100)]
stmmac: dwmac-sti: remove useless of_node check
Since dwmac-sti is a DT only driver checking for OF node is not necessary.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>