David S. Miller [Tue, 22 Nov 2016 15:04:19 +0000 (10:04 -0500)]
Merge branch 'mlxsw-thermal-zone'
Jiri Pirko says:
====================
mlxsw: core: Implement thermal zone
Implement thermal zone for mlxsw based HW.
The first patch is just a register dependency for the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 22 Nov 2016 10:24:13 +0000 (11:24 +0100)]
mlxsw: core: Implement thermal zone
Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 22 Nov 2016 10:24:12 +0000 (11:24 +0100)]
mlxsw: reg: Add Management Fan Speed Limit register
The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 22 Nov 2016 14:55:32 +0000 (09:55 -0500)]
Merge branch 'mv88e6390-initial-support'
Andrew Lunn says:
====================
Start adding support for mv88e6390
This is the first patchset implementing support for the mv88e6390
family. This is a new generation of switch devices and has numerous
incompatible changes to the registers. These patches allow the switch
to the detected during probe, and makes the statistics unit work.
These patches are insufficient to make the mv88e6390 functional. More
patches will follow.
v2:
Move stats code into global1
Change DT compatible string to mv88e6190
Fixed mv88e6351 stats which v1 had broken
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:05 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Move g1 stats code in global1.[ch]
Move the stats functions which access global 1 registers into
global1.c.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:04 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Implement mv88e6390 get_stats
The mv88e6390 uses a different bit to select between bank0 and bank1
of the statistics. So implement an ops function for this, and pass the
selector bit to the generic stats read function. Also, the histogram
selection has moved for the mv88e6390, so abstract its selection as
well.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:03 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add stats_get_stats to ops structure
Different families have different sets of statistics. Abstract this
using a stats_get_stats op. The mv88e6390 needs a different
implementation, which will be added later.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:02 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add stats_get_sset_count|string to ops structure
Different families have different sets of statistics. Abstract this
using a stats_get_sset_count and stats_get_strings op. Each stat has a
bitmap, and the ops implementer uses a bit map mask to count the
statistics which apply for the family, or return the list of strings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
Rename functions to avoid _ prefix.
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:01 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add mv88e6390 statistics unit init
The statistics unit on the mv88e6390 needs the histogram mode to be
configured in a different register compared to other devices. Add an
ops to do this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
Rename to mv88e6390_g1_stats_set_histogram
Move into global1.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:27:00 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add mv88e6390 stats snapshot operation
The MV88E6390 has a control register for what the histogram statistics
actually contain. This means the stat_snapshot method should not set
this information. So implement the 6390 stats_snapshot function without
these bits.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:26:59 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Add comment about family a device belongs to
Knowing the family of device belongs to helps with picking the ops
implementation which is appropriate to the device. So add a comment to
each structure of ops.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:26:58 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Abstract stats_snapshot into ops structure
Taking a stats snapshot differs between same families. Abstract this
into an ops member. At the same time, move the code into global1.[ch],
since the registers are in the global1 range.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:26:57 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Add the mv88e6390 family
With the devices added to the tables, the probe will recognize the
switch. This however is not sufficient to make it work properly, other
changes are needed because of incompatibilities.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:26:56 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Fix unused variable warning by using variable
_mv88e6xxx_stats_wait() did not check the return value from
mv88e6xxx_g1_read(), so the compiler complained about set but unused
err.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 21 Nov 2016 22:26:55 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Take switch out of reset before probe
The switch needs to be taken out of reset before we can read its ID
register on the MDIO bus.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhumika Goyal [Mon, 21 Nov 2016 20:30:14 +0000 (02:00 +0530)]
net: ieee802154: constify ieee802154_ops structures
Declare the structure ieee802154_ops as const as it is only passed as an
argument to the function ieee802154_alloc_hw. This argument is of type
const struct ieee802154_ops *, so ieee80254_ops structures having this
property can be declared as const.
Done using Coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct ieee802154_ops i@p = {...};
@ok1@
identifier r1.i;
position p;
expression e1;
@@
ieee802154_alloc_hw(e1,&i@p)
@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p
@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
static
+const
struct ieee802154_ops i={...};
@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct ieee802154_ops i;
The before and after size details of the affected files are:
text data bss dec hex filename
8669 1176 16 9861 2685 drivers/net/ieee802154/
adf7242.o
8805 1048 16 9869 268d drivers/net/ieee802154/
adf7242.o
text data bss dec hex filename
7211 2296 32 9539 2543 drivers/net/ieee802154/atusb.o
7339 2160 32 9531 253b drivers/net/ieee802154/atusb.o
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2016 19:05:50 +0000 (14:05 -0500)]
Merge branch 'geneve-lwt-efficiency'
Pravin B Shelar says:
====================
geneve: Use LWT more effectively.
Following patch series make use of geneve LWT code path for
geneve netdev type of device.
This allows us to simplify geneve module without changing any
functionality.
v2-v3:
Rebase against latest net-next.
v1-v2:
Fix warning reported by kbuild test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 21 Nov 2016 19:03:01 +0000 (11:03 -0800)]
geneve: Optimize geneve device lookup.
Rather than comparing 64-bit tunnel-id, compare tunnel vni
which is 24-bit id. This also save conversion from vni
to tunnel id on each tunnel packet receive.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 21 Nov 2016 19:03:00 +0000 (11:03 -0800)]
geneve: Remove redundant socket checks.
Geneve already has check for device socket in route
lookup function. So no need to check it in xmit
function.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 21 Nov 2016 19:02:59 +0000 (11:02 -0800)]
geneve: Merge ipv4 and ipv6 geneve_build_skb()
There are minimal difference in building Geneve header
between ipv4 and ipv6 geneve tunnels. Following patch
refactors code to unify it.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 21 Nov 2016 19:02:58 +0000 (11:02 -0800)]
geneve: Unify LWT and netdev handling.
Current geneve implementation has two separate cases to handle.
1. netdev xmit
2. LWT xmit.
In case of netdev, geneve configuration is stored in various
struct geneve_dev members. For example geneve_addr, ttl, tos,
label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
to the device in ip_tunnel_info.
Following patch uses ip_tunnel_info struct to store almost all
of configuration of a geneve netdevice. This allows us to unify
most of geneve driver code around ip_tunnel_info struct.
This dramatically simplify geneve code, since it does not
need to handle two different configuration cases. Removes
duplicate code, single code path can handle either type
of geneve devices.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2016 18:20:17 +0000 (13:20 -0500)]
Merge branch 'tcp-cong-undo_cwnd-mandatory'
Florian Westphal says:
====================
tcp: make undo_cwnd mandatory for congestion modules
highspeed, illinois, scalable, veno and yeah congestion control algorithms
don't provide a 'cwnd_undo' function. This makes the stack default to a
'reno undo' which doubles cwnd. However, the ssthresh implementation of
these algorithms do not halve the slowstart threshold. This causes similar
issue as the one fixed for dctcp in
ce6dd23329b1e ("dctcp: avoid bogus
doubling of cwnd after loss").
In light of this it seems better to remove the fallback and make undo_cwnd
mandatory.
First patch fixes those spots where reno undo seems incorrect by providing
.cwnd_undo functions, second patch removes the fallback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 21 Nov 2016 13:18:38 +0000 (14:18 +0100)]
tcp: make undo_cwnd mandatory for congestion modules
The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
which un-does reno halving behaviour.
It seems more appropriate to let congctl algorithms pair .ssthresh
and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
up for all congestion algorithms that used to rely on the fallback.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 21 Nov 2016 13:18:37 +0000 (14:18 +0100)]
tcp: add cwnd_undo functions to various tcp cc algorithms
congestion control algorithms that do not halve cwnd in their .ssthresh
should provide a .cwnd_undo rather than rely on current fallback which
assumes reno halving (and thus doubles the cwnd).
All of these do 'something else' in their .ssthresh implementation, thus
store the cwnd on loss and provide .undo_cwnd to restore it again.
A followup patch will remove the fallback and all algorithms will
need to provide a .cwnd_undo function.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2016 18:16:59 +0000 (13:16 -0500)]
Merge branch 'bridge-igmpv3-mldv2-support'
Nikolay Aleksandrov says:
====================
bridge: add support for IGMPv3 and MLDv2 querier
This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
Two new options which can be toggled via netlink and sysfs are added that
control the version per-bridge:
multicast_igmp_version - default 2, can be set to 3
multicast_mld_version - default 1, can be set to 2 (this option is
disabled if CONFIG_IPV6=n)
Note that the names do not include "querier", I think that these options
can be re-used later as more IGMPv3 support is added to the bridge so we
can avoid adding more options to switch between v2 and v3 behaviour.
The set uses the already existing br_ip{4,6}_multicast_alloc_query
functions and adds the appropriate header based on the chosen version.
For the initial support I have removed the compatibility implementation
(RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
some details that we need to sort out.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 21 Nov 2016 12:03:25 +0000 (13:03 +0100)]
bridge: mcast: add MLDv2 querier support
This patch adds basic support for MLDv2 queries, the default is MLDv1
as before. A new multicast option - multicast_mld_version, adds the
ability to change it between 1 and 2 via netlink and sysfs.
The MLD option is disabled if CONFIG_IPV6 is disabled.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 21 Nov 2016 12:03:24 +0000 (13:03 +0100)]
bridge: mcast: add IGMPv3 query support
This patch adds basic support for IGMPv3 queries, the default is IGMPv2
as before. A new multicast option - multicast_igmp_version, adds the
ability to change it between 2 and 3 via netlink and sysfs. The option
struct member is in a 4 byte hole in net_bridge.
There also a few minor style adjustments in br_multicast_new_group and
br_multicast_add_group.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Mon, 21 Nov 2016 00:26:38 +0000 (08:26 +0800)]
driver: macvlan: Remove duplicated IFF_UP condition check in macvlan_forward_source
The function macvlan_forward_source_one has already checked the flag
IFF_UP, so needn't check it outside in macvlan_forward_source too.
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 20 Nov 2016 17:24:36 +0000 (09:24 -0800)]
mlx4: avoid unnecessary dirtying of critical fields
While stressing a 40Gbit mlx4 NIC with busy polling, I found false
sharing in mlx4 driver that can be easily avoided.
This patch brings an additional 7 % performance improvement in UDP_RR
workload.
1) If we received no frame during one mlx4_en_process_rx_cq()
invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
2) Do not refill rx buffers if we have plenty of them.
This avoids false sharing and allows some bulk/batch optimizations.
Page allocator and its locks will thank us.
Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
cpu handling NIC IRQ should be changed. We should return budget-1
instead, to not fool net_rx_action() and its netdev_budget.
v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Nov 2016 22:57:11 +0000 (14:57 -0800)]
bnx2: use READ_ONCE() instead of barrier()
barrier() is a big hammer compared to READ_ONCE(),
and requires comments explaining what is protected.
READ_ONCE() is more precise and compiler should generate
better overall code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Nov 2016 01:18:03 +0000 (17:18 -0800)]
udp: avoid one cache line miss in recvmsg()
UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
requesting a cold cache line being read in cpu cache.
We can avoid this cache line miss for UDP sockets,
as partial_cov has a meaning only for UDPLite.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2016 16:25:59 +0000 (11:25 -0500)]
Merge branch 'mlx5-bpf-refcnt-fixes'
Daniel Borkmann says:
====================
Couple of BPF refcount fixes for mlx5
Various mlx5 bugs on eBPF refcount handling found during review.
Last patch in series adds a __must_check to BPF helpers to make
sure we won't run into it again w/o compiler complaining first.
v2 -> v3:
- Just reworked patch 2/4 so we don't need bpf_prog_sub().
- Rebased, rest as is.
v1 -> v2:
- After discussion with Alexei, we agreed upon rebasing the
patches against net-next.
- Since net-next, I've also added the __must_check to enforce
future users to check for errors.
- Fixed up commit message #2.
- Simplify assignment from patch #1 based on Saeed's feedback
on previous set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:03 +0000 (01:45 +0100)]
bpf: add __must_check attributes to refcount manipulating helpers
Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
with an error, so make sure the caller properly checks their return
value and not just ignores it, which could worst-case lead to use
after free.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:02 +0000 (01:45 +0100)]
bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled, otherwise we leak the program.
Fixes:
86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:01 +0000 (01:45 +0100)]
bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
There are multiple issues in mlx5e_xdp_set():
1) The batched bpf_prog_add() is currently not checked for errors. When
doing so, it should be done at an earlier point in time to makes sure
that we cannot fail anymore at the time we want to set the program for
each channel. The batched refs short-cut can only be performed when we
don't need to perform a reset for changing the rq type and the device
was in opened state. In case the device was not in opened state, then
the next mlx5e_open_locked() will aquire the refs from the control prog
via mlx5e_create_rq(), same when we need to perform a reset.
2) When swapping the priv->xdp_prog, then no extra reference count must be
taken since we got that from call path via dev_change_xdp_fd() already.
Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
without checking the return code could fail.
Fixes:
86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:00 +0000 (01:45 +0100)]
bpf, mlx5: fix mlx5e_create_rq taking reference on prog
In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail since
92117d8443bc
("bpf: fix refcnt overflow"), so we really must check it. Take the reference
right when we assign it to the rq from priv->xdp_prog, and just drop the
reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.
Fixes:
86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Nov 2016 02:16:14 +0000 (21:16 -0500)]
Merge branch 'mV88e6xxx-interrupt-fixes'
Andrew Lunn says:
====================
Fixes for the MV88e6xxx interrupt code
The interrupt code was never tested with a board who's probing
resulted in an -EPROBE_DEFFERED. So the clean up paths never got
tested. I now do have -EPROBE_DEFFERED, and things break badly during
cleanup. These are the fixes.
This is fixing code in net-next.
v2:
Fix typo pointed out by David Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:19 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Hold the mutex while freeing g1 interrupts
Freeing interrupts requires switch register access to mask the
interrupts. Hence we must hold the register mutex.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:18 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix releasing for the global2 interrupts
It is not possible to use devm_request_threaded_irq() because we have
two stacked interrupt controllers in one device. The lower interrupt
controller cannot be removed until the upper is fully removed. This
happens too late with the devm API, resulting in error messages about
removing a domain while there is still an active interrupt. Swap to
using request_threaded_irq() and manage the release of the interrupt
manually.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:17 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix cleanup on error for g1 interrupt setup
On error, remask the interrupts, release all maps, and remove the
domain. This cannot be done using the mv88e6xxx_g1_irq_free() because
some of these actions are not idempotent.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:16 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt
Fix the g1 interrupt free code such that is masks any further
interrupts, and then releases the interrupt.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:15 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix unconditional irq freeing
Trying to remove an IRQ domain that was not created results in an
Opps. Add the necessary checks that the irqs were created before
freeing them.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 20 Nov 2016 19:14:14 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix typos when removing g1 interrupts
Simple typos, s/g2/g1/
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Sat, 19 Nov 2016 01:08:08 +0000 (04:08 +0300)]
net: fix bogus cast in skb_pagelen() and use unsigned variables
1) cast to "int" is unnecessary:
u8 will be promoted to int before decrementing,
small positive numbers fit into "int", so their values won't be changed
during promotion.
Once everything is int including loop counters, signedness doesn't
matter: 32-bit operations will stay 32-bit operations.
But! Someone tried to make this loop smart by making everything of
the same type apparently in an attempt to optimise it.
Do the optimization, just differently.
Do the cast where it matters. :^)
2) frag size is unsigned entity and sum of fragments sizes is also
unsigned.
Make everything unsigned, leave no MOVSX instruction behind.
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4)
function old new delta
skb_cow_data 835 834 -1
ip_do_fragment 2549 2548 -1
ip6_fragment 3130 3128 -2
Total: Before=
154865032, After=
154865028, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Sat, 19 Nov 2016 00:59:07 +0000 (03:59 +0300)]
netlink: smaller nla_attr_minlen table
Length of a netlink attribute may be u16 but lengths of basic attributes
are much smaller, so small we can save 16 bytes of .rodata and pocket
change inside .text.
16-bit is worse on x86-64 than 8-bit because of operand size override prefix.
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19)
function old new delta
validate_nla 418 417 -1
nla_policy_len 66 64 -2
nla_attr_minlen 32 16 -16
Total: Before=
154865051, After=
154865032, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Sat, 19 Nov 2016 00:54:35 +0000 (03:54 +0300)]
netlink: use "unsigned int" in nla_next()
->nla_len is unsigned entity (it's length after all) and u16,
thus it can't overflow when being aligned into int/unsigned int.
(nlmsg_next has the same code, but I didn't yet convince myself
it is correct to do so).
There is pointer arithmetic in this function and offset being
unsigned is better:
add/remove: 0/0 grow/shrink: 1/64 up/down: 5/-309 (-304)
function old new delta
nl80211_set_wiphy 1444 1449 +5
team_nl_cmd_options_set 997 995 -2
tcf_em_tree_validate 872 870 -2
switchdev_port_bridge_setlink 352 350 -2
switchdev_port_br_afspec 312 310 -2
rtm_to_fib_config 428 426 -2
qla4xxx_sysfs_ddb_set_param 2193 2191 -2
qla4xxx_iface_set_param 4470 4468 -2
ovs_nla_free_flow_actions 152 150 -2
output_userspace 518 516 -2
...
nl80211_set_reg 654 649 -5
validate_scan_freqs 148 142 -6
validate_linkmsg 288 282 -6
nl80211_parse_connkeys 489 483 -6
nlattr_set 231 224 -7
nf_tables_delsetelem 267 260 -7
do_setlink 3416 3408 -8
netlbl_cipsov4_add_std 1672 1659 -13
nl80211_parse_sched_scan 2902 2888 -14
nl80211_trigger_scan 1738 1720 -18
do_execute_actions 2821 2738 -83
Total: Before=
154865355, After=
154865051, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Sat, 19 Nov 2016 00:47:56 +0000 (03:47 +0300)]
net: make struct napi_alloc_cache::skb_count unsigned int
size_t is way too much for an integer not exceeding 64.
Space savings: 10 bytes!
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-10 (-10)
function old new delta
napi_consume_skb 165 163 -2
__kfree_skb_flush 56 53 -3
__kfree_skb_defer 97 92 -5
Total: Before=
154865639, After=
154865629, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 19 Nov 2016 16:13:05 +0000 (11:13 -0500)]
Merge tag 'batadv-next-for-davem-
20161119' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature patchset includes the following changes:
- 6 patches adding functionality to detect a WiFi interface under
other virtual interfaces, like VLANs. They introduce a cache for
the detected the WiFi configuration to avoid RTNL locking in
critical sections. Patches have been prepared by Marek Lindner
and Sven Eckelmann
- Enable automatic module loading for genl requests, by Sven Eckelmann
- Fix a potential race condition on interface removal. This is not
happening very often in practice, but requires bigger changes to fix,
so we are sending this to net-next. By Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:42 +0000 (15:40 -0800)]
af_packet: Use virtio_net_hdr_from_skb() directly.
Remove static function __packet_rcv_vnet(), which only called
virtio_net_hdr_from_skb() and BUG()ged out if an error code was
returned. Instead, call virtio_net_hdr_from_skb() from the former
call sites of __packet_rcv_vnet() and actually use the error handling
code that is already there.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:41 +0000 (15:40 -0800)]
af_packet: Use virtio_net_hdr_to_skb().
Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit
fd2a0437dc, but this one was
missed, maybe because it is split in two parts of the source code.
Interim comparisons of 'vnet_hdr->gso_type' still work as both the
vnet_hdr and skb notion of gso_type is zero when there is no gso.
Fixes:
fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:40 +0000 (15:40 -0800)]
virtio_net: Do not clear memory for struct virtio_net_hdr twice.
virtio_net_hdr_from_skb() clears the memory for the header, so there
is no point for the callers to do the same.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:39 +0000 (15:40 -0800)]
virtio_net.h: Fix comment.
Fix incorrent comment after the final #endif.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:38 +0000 (15:40 -0800)]
virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb().
No point storing the return value of virtio_net_hdr_to_skb() or
virtio_net_hdr_from_skb() to a variable when the value is used only
once as a boolean in an immediately following if statement.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 18 Nov 2016 11:07:40 +0000 (16:37 +0530)]
cxgb4: Allocate Tx queues dynamically
Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 18 Nov 2016 11:47:35 +0000 (14:47 +0300)]
liquidio CN23XX: bitwise vs logical AND typo
We obviously intended a bitwise AND here, not a logical one.
Fixes:
8c978d059224 ("liquidio CN23XX: Mailbox support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung Huh [Thu, 17 Nov 2016 22:10:02 +0000 (22:10 +0000)]
lan78xx: relocate mdix setting to phy driver
Relocate mdix code to phy driver to be called at config_init().
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 18:54:22 +0000 (13:54 -0500)]
Merge branch 'net-marvell-freescale-compile-test'
Florian Fainelli says:
====================
net: Enable COMPILE_TEST for Marvell & Freescale drivers
This patch series allows building the Freescale and Marvell Ethernet network
drivers with COMPILE_TEST.
Changes in v4:
- add proper HAS_DMA to fix build errors on m32r
- provide an inline stub for mvebu_mbus_get_dram_win_info
- added an additional patch to fix build errors with mv88e6xxx on m32r
Changes in v3:
- reorder patches to avoid introducing a build warning between commits
Changes in v2:
- rename register define clash when building for i386 (spotted by LKP)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:14 +0000 (11:19 -0800)]
net: dsa: mv88e6xxx: Select IRQ_DOMAIN
Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.
Fixes:
dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:13 +0000 (11:19 -0800)]
net: marvell: Allow drivers to be built with COMPILE_TEST
All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:12 +0000 (11:19 -0800)]
bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info
In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:11 +0000 (11:19 -0800)]
net: fsl: Allow most drivers to be built with COMPILE_TEST
There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:
* FEC, for which we would need to define a default register layout if no
supported architecture is defined
* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
to a generic location)
* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
architectures (like m32r)
We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)
which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:10 +0000 (11:19 -0800)]
net: gianfar_ptp: Rename FS bit to FIPERST
FS is a global symbol used by the x86 32-bit architecture, fixes builds
re-definitions:
>> drivers/net/ethernet/freescale/gianfar_ptp.c:75:0: warning: "FS"
>> redefined
#define FS (1<<28) /* FIPER start indication */
In file included from arch/x86/include/uapi/asm/ptrace.h:5:0,
from arch/x86/include/asm/ptrace.h:6,
from arch/x86/include/asm/math_emu.h:4,
from arch/x86/include/asm/processor.h:11,
from include/linux/mutex.h:19,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:15,
from include/linux/kobject.h:21,
from include/linux/device.h:17,
from
drivers/net/ethernet/freescale/gianfar_ptp.c:23:
arch/x86/include/uapi/asm/ptrace-abi.h:15:0: note: this is the
location of the previous definition
#define FS 9
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 17 Nov 2016 14:43:37 +0000 (08:43 -0600)]
amd-xgbe: Update connection validation for backplane mode
Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 17:12:15 +0000 (12:12 -0500)]
Merge branch 'ethtool-phy-downshift'
Allan W. Nielsen says:
====================
Adding PHY-Tunables and downshift support
(This is a re-post of the v3 patch set with a new cover letter - I was not
aware that the cover letters was used a commit comments in merge commits).
This series add support for PHY tunables, and uses this facility to
configure downshifting. The downshifting mechanism is implemented for MSCC
phys.
This series tries to address the comments provided back in mid October when
this feature was posted along with fast-link-failure. Fast-link-failure has
been separated out, but we would like to pick continue on that if/when we
agree on how the phy-tunables and downshifting should be done.
The proposed generic interface is similar to
ETHTOOL_GTUNABLE/ETHTOOL_STUNABLE, it uses the same type
(ethtool_tunable/tunable_type_id) but a new enum (phy_tunable_id) is added
to reflect the PHY tunable.
The implementation just call the newly added function pointers in
get_tunable/set_tunable phy_device structure.
To configure downshifting, the ethtool_tunable structure is used. 'id' must
be set to 'ETHTOOL_PHY_DOWNSHIFT', 'type_id' must be set to
'ETHTOOL_TUNABLE_U8' and 'data' value configure the amount of downshift
re-tries.
If configured to DOWNSHIFT_DEV_DISABLE, then downshift is disabled If
configured to DOWNSHIFT_DEV_DEFAULT_COUNT, then it is up to the device to
choose a device-specific re-try count.
Tested on Beaglebone Black with VSC 8531 PHY.
Change set:
v0:
- Link Speed downshift and Fast Link failure-2 features coded by using
Device tree.
v1:
- Split the Downshift and FLF2 features in different set of patches.
- Removed DT access and implemented IOCTL access suggested by Andrew.
- Added function pointers in get_tunable/set_tunable phy_device structure
v2:
- Added trace message with a hist is printed when downshifting clould not
be eanbled with the requested count
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
Andrew.
v3:
- Fixed Spelling in "net: phy: Add downshift get/set support in Microsemi
PHYs driver"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:24 +0000 (13:07 +0100)]
net: phy: Add downshift get/set support in Microsemi PHYs driver
Implements the phy tunable function pointers and implement downshift
functionality for MSCC PHYs.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:23 +0000 (13:07 +0100)]
ethtool: Core impl for ETHTOOL_PHY_DOWNSHIFT tunable
Adding validation support for the ETHTOOL_PHY_DOWNSHIFT. Functional
implementation needs to be done in the individual PHY drivers.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:22 +0000 (13:07 +0100)]
ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables
For operation in cabling environments that are incompatible with
1000BASE-T, PHY device may provide an automatic link speed downshift
operation. When enabled, the device automatically changes its 1000BASE-T
auto-negotiation to the next slower speed after a configured number of
failed attempts at 1000BASE-T. This feature is useful in setting up in
networks using older cable installations that include only pairs A and B,
and not pairs C and D.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:21 +0000 (13:07 +0100)]
ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE
Adding get_tunable/set_tunable function pointer to the phy_driver
structure, and uses these function pointers to implement the
ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:20 +0000 (13:07 +0100)]
ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE
Defines a generic API to get/set phy tunables. The API is using the
existing ethtool_tunable/tunable_type_id types which is already being used
for mac level tunables.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 17:08:58 +0000 (12:08 -0500)]
Merge branch 'mlx5-next'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 update 2016-11-15
This series contains four humble mlx5 features.
From Gal,
- Add the support for PCIe statistics and expose them in ethtool
From Huy,
- Add the support for port module events reporting and statistics
- Add the support for driver version setting into FW (for display purposes only)
From Mohamad,
- Extended the command interface cache flexibility
This series was generated against commit
6a02f5eb6a8a ("Merge branch 'mlxsw-i2c")
V2:
- Changed plain "unsigned" to "unsigned int"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Thu, 17 Nov 2016 11:46:02 +0000 (13:46 +0200)]
net/mlx5e: Expose PCIe statistics to ethtool
This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Thu, 17 Nov 2016 11:46:01 +0000 (13:46 +0200)]
net/mlx5: Add MPCNT register infrastructure
Add the needed infrastructure for future use of MPCNT register.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:46:00 +0000 (13:46 +0200)]
net/mlx5: Set driver version into firmware
If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.
Example of driver version: "Linux,mlx5_core,3.0-1"
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Thu, 17 Nov 2016 11:45:59 +0000 (13:45 +0200)]
net/mlx5: Set driver version infrastructure
Add driver_version capability bit is enabled, and set driver
version command in mlx5_ifc firmware header. The only purpose
of this command is to store a driver version/OS string in FW
to be reported and displayed in various management systems,
such as IPMI/BMC.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:58 +0000 (13:45 +0200)]
net/mlx5e: Add port module event counters to ethtool stats
Add port module event counters to ethtool -S command
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:57 +0000 (13:45 +0200)]
net/mlx5: Add handling for port module event
For each asynchronous port module event:
1. print with ratelimit to the dmesg log
2. increment the corresponding event counter
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:56 +0000 (13:45 +0200)]
net/mlx5: Port module event hardware structures
Add hardware structures and constants definitions needed for module
events support.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mohamad Haj Yahia [Thu, 17 Nov 2016 11:45:55 +0000 (13:45 +0200)]
net/mlx5: Make the command interface cache more flexible
Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.
Fixes:
e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 16:55:39 +0000 (11:55 -0500)]
Merge branch 'sfc-tso-v2'
Edward Cree says:
====================
sfc: Firmware-Assisted TSO version 2
The firmware on 8000 series SFC NICs supports a new TSO API ("FATSOv2"), and
7000 series NICs will also support this in an imminent release. This series
adds driver support for this TSO implementation.
The series also removes SWTSO, as it's now equivalent to GSO. This does not
actually remove very much code, because SWTSO was grotesquely intertwingled
with FATSOv1, which will also be removed once 7000 series supports FATSOv2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:52:36 +0000 (10:52 +0000)]
sfc: remove Software TSO
It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:52:07 +0000 (10:52 +0000)]
sfc: handle failure to allocate TSOv2 contexts
If we fail to init the TXQ because of insufficient TSOv2 contexts,
try again with TSOv2 disabled.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Thu, 17 Nov 2016 10:51:54 +0000 (10:51 +0000)]
sfc: Firmware-Assisted TSO version 2
Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
of TCP segmentation to the firmware, such that we now just pass a single
super-packet to the NIC. This means TSO has a great deal in common with a
normal DMA transmit, apart from adding a couple of option descriptors.
NIC-specific checks have been moved off the fast path and in to
initialisation where possible.
This also moves FATSOv1/SWTSO to a new file (tx_tso.c). The end of transmit
and some error handling is now outside TSO, since it is common with other
code.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:51:39 +0000 (10:51 +0000)]
sfc: Update EF10 register definitions
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:51:30 +0000 (10:51 +0000)]
sfc: Update MCDI protocol definitions
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Thu, 17 Nov 2016 01:58:21 +0000 (04:58 +0300)]
netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.
2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.
"int" being used as an array index needs to be sign-extended
to 64-bit before being used.
void f(long *p, int i)
{
g(p[i]);
}
roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call g
MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.
Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:
static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}
And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]
However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=
154856051, After=
154854321, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 17:10:42 +0000 (09:10 -0800)]
udp: enable busy polling for all sockets
UDP busy polling is restricted to connected UDP sockets.
This is because sk_busy_loop() only takes care of one NAPI context.
There are cases where it could be extended.
1) Some hosts receive traffic on a single NIC, with one RX queue.
2) Some applications use SO_REUSEPORT and associated BPF filter
to split the incoming traffic on one UDP socket per RX
queue/thread/cpu
3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()
This patch records the napi_id of first received skb, giving more
reach to busy polling.
Tested:
lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done
Before patch :
27867 28870 37324 41060 41215
36764 36838 44455 41282 43843
After patch :
73920 73213 70147 74845 71697
68315 68028 75219 70082 73707
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Nov 2016 18:35:19 +0000 (13:35 -0500)]
Merge branch 'rds-ha-failover-fixes'
Sowmini Varadhan says:
====================
RDS: TCP: HA/Failover fixes
This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:
while (1); do
# modprobe rds-tcp on test nodes
# run rds-stress in bi-dir mode between test machine pair
# modprobe -r rds-tcp on test nodes
done
rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.
Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the test overnight, without any issues.
Each patch has a detailed description of the root-cause fixed
by the patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:50 +0000 (13:29 -0800)]
RDS: TCP: Force every connection to be initiated by numerically smaller IP address
When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit
241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").
The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.
This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.
A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:49 +0000 (13:29 -0800)]
RDS: TCP: Track peer's connection generation number
The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
but the RDS socket/connection layer on both sides stays
the same
(b) when the peer's RDS layer itself resets (e.g., due to module
reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().
In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.
To achieve this, the RDS handshake probe added as part of
commit
5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:48 +0000 (13:29 -0800)]
RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.
The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:41 +0000 (20:09 +0100)]
net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart
As sugested by Joe Perches, we could replace all
if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
by the simpler macro netif_xxx(priv, hw, priv->dev, ...)
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:40 +0000 (20:09 +0100)]
net: stmmac: replace hardcoded function name by __func__
Some printing have the function name hardcoded.
It is better to use __func__ instead.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:39 +0000 (20:09 +0100)]
net: stmmac: replace all pr_xxx by their netdev_xxx counterpart
The stmmac driver use lots of pr_xxx functions to print information.
This is bad since we cannot know which device logs the information.
(moreover if two stmmac device are present)
Furthermore, it seems that it assumes wrongly that all logs will always
be subsequent by using a dev_xxx then some indented pr_xxx like this:
kernel: sun7i-dwmac
1c50000.ethernet: no reset control found
kernel: Ring mode enabled
kernel: No HW DMA feature register supported
kernel: Normal descriptors
kernel: TX Checksum insertion supported
So this patch replace all pr_xxx by their netdev_xxx counterpart.
Excepts for some printing where netdev "cause" unpretty output like:
sun7i-dwmac
1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
In those case, I keep dev_xxx.
In the same time I remove some "stmmac:" print since
this will be a duplicate with that dev_xxx displays.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Nov 2016 17:48:30 +0000 (09:48 -0800)]
net_sched: sch_fq: use hash_ptr()
When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().
Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 14:21:34 +0000 (06:21 -0800)]
net/mlx5e: remove napi_hash_del() calls
Calling napi_hash_del() after netif_napi_del() is pointless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 13:49:22 +0000 (05:49 -0800)]
net/mlx4_en: remove napi_hash_del() call
There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()
netif_napi_del() does this already.
Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Nov 2016 04:29:05 +0000 (23:29 -0500)]
Merge branch 'mlxsw-i2c'
Jiri Pirko says:
====================
mlxsw: Introduce support for I2C bus
Vadim says:
This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB,
SwitchIB2 and Spectrum silicones.
It contains:
- Small changes in mlxsw core code, needed for I2C bus support;
- I2C driver, which obtains I2C input/output mailboxes setting and
provides command interface implementation.
- Minimal driver, which works on top of I2C driver and allows running
of mlxsw command interface over I2C bus;
Use case:
On system, which does not have PCI to ASIC (BMC), hwmon functionality
(sensors, pwm, tacho) will be available through I2C.
Usage (manual probing):
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
Sysfs interface:
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:46 +0000 (15:20 +0100)]
mlxsw: minimal: Add I2C support for Mellanox ASICs
Add I2C access support for Mellanox ASICs:
- Virtual Protocol Interconnect switches SwitchX, SwitchX2,
providing InfiniBand, Ethernet and Fibre Channel connectivity;
- Infiniband switches SwitchIB, SwitchIB2:
- Ethernet switch Spectrum.
Example of probing activation:
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:45 +0000 (15:20 +0100)]
mlxsw: Invoke driver's init/fini methods only if defined
We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.
Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:44 +0000 (15:20 +0100)]
mlxsw: Introduce support for I2C bus
Add I2C bus implementation for Mellanox Technologies Switch ASICs.
This includes command interface implementation using input / out mailboxes,
whose location is retrieved from the firmware during probe time.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>