Florian Fainelli [Mon, 28 Nov 2016 02:45:12 +0000 (18:45 -0800)]
Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Färber [Sun, 27 Nov 2016 22:26:28 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.
Fixes:
dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 20:09:37 +0000 (15:09 -0500)]
Merge branch 'mlx5-DCBX-and-ethtool-updates'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 DCBX and ethtool updates
This series provides the following mlx5 updates:
From Huy:
DCBX CEE API and DCBX firmware/host modes support.
- 1st patch ensures the dcbnl_rtnl_ops is published only when the qos
capability bits is on.
- 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops
- 3rd patch refactors ETS query to read ETS configuration directly from
firmware rather than having a software shadow to it. The existing IEEE
interfaces stays the same.
- 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP
firmware commands to manipulate mlx5 DCBX mode.
- 5th patch adds the driver support for the new DCBX firmware. This ensures
the backward compatibility versus the old and new firmware. With the new DCBX
firmware, qos settings can be controlled by either firmware or software
depending on the DCBX mode.
From Kamal and Saeed:
- mlx5 self-test support.
From Shaker:
- Private flag to give the user the ability to enable/disable mlx5 CQE
compression.
V1->V2:
- Check ETS capability where needed in:
("net/mlx5e: Read ETS settings directly from firmware")
- Fix return value of mlx5e_dcbnl_switch_to_host_mode in:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
- Update commit message of:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
- Fix two sparse static check warnings in en_selftest.c
This series was generated against commit:
e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaker Daibes [Sun, 27 Nov 2016 15:02:12 +0000 (17:02 +0200)]
net/mlx5e: Add CQE compression user control
The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaker Daibes [Sun, 27 Nov 2016 15:02:11 +0000 (17:02 +0200)]
net/mlx5e: Moves pflags to priv->params
pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 27 Nov 2016 15:02:10 +0000 (17:02 +0200)]
net/mlx5e: Add support for loopback selftest
Extend the self diagnostic tests to support loopback test.
The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Sun, 27 Nov 2016 15:02:09 +0000 (17:02 +0200)]
net/mlx5e: Add support for ethtool self diagnostics test
The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:08 +0000 (17:02 +0200)]
net/mlx5e: Add DCBX control interface
Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:07 +0000 (17:02 +0200)]
net/mlx5e: ConnectX-4 firmware support for DCBX
DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.
This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:06 +0000 (17:02 +0200)]
net/mlx5: Add DCBX firmware commands support
Add set/query commands for DCBX_PARAM register
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:05 +0000 (17:02 +0200)]
net/mlx5e: Read ETS settings directly from firmware
Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.
With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.
Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
creation.
b. When reading ETS setting from FW, if the traffic class bandwidth
is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
implementation solves the scenarios when the DCBX is in FW control
and willing bit is on which means the ETS setting is dictated
by remote switch.
Also check ETS capability where needed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:04 +0000 (17:02 +0200)]
net/mlx5e: Support DCBX CEE API
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.
Note:
priority group in CEE is equivalent to traffic class in ConnectX-4
hardware spec.
bw allocation per priority in CEE is not supported because ConnectX-4
only supports bw allocation per traffic class.
user priority in CEE does not have an equivalent term in ConnectX-4.
Therefore, user priority to priority mapping in CEE is not supported.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Sun, 27 Nov 2016 15:02:03 +0000 (17:02 +0200)]
net/mlx5e: Add qos capability check
Make sure firmware supports qos before exposing the DCB API.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 25 Nov 2016 04:37:26 +0000 (12:37 +0800)]
virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:
- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
cases
So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 16:58:57 +0000 (11:58 -0500)]
Merge branch 'MV88E6097-fixes'
Stefan Eichenberger says:
====================
Fix support for the MV88E6097
This patchset fixes the following two issues for the MV88E6097:
- Add missing definition of g1_irqs
- Add missing comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:30 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add missing comment for MV88E6097
Add a missing comment for the MV88E6097 because of unification.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:29 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097
Add the missing definition of g1_irqs for MV88E6097.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 01:38:49 +0000 (20:38 -0500)]
Merge branch 'bpf-misc-next'
Daniel Borkmann says:
====================
BPF cleanups and misc updates
This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:09 +0000 (01:28 +0100)]
bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since
the sys/resource.h header is not included.
2) test_verifier fails in one test case where we try to call an invalid
function, since the verifier log output changed wrt printing function
names.
3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
retrieving the number of possible CPUs. This is broken at least in our
scenario and really just doesn't work.
glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
if that fails, depending on the config, it either tries to count CPUs
in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
If /proc/cpuinfo has some issue, it returns just 1 worst case. This
oddity is nothing new [1], but semantics/behaviour seems to be settled.
_SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
that fails it looks into /proc/stat for cpuX entries, and if also that
fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
unlikely all breaks down).
While that might match num_possible_cpus() from the kernel in some
cases, it's really not guaranteed with CPU hotplugging, and can result
in a buffer overflow since the array in user space could have too few
number of slots, and on perpcu map lookup, the kernel will write beyond
that memory of the value buffer.
William Tu reported such mismatches:
[...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
happens when CPU hotadd is enabled. For example, in Fusion when
setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
-smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
will be 2 [2]. [...]
Documentation/cputopology.txt says /sys/devices/system/cpu/possible
outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
own implementation. Later, we could add support to bpf(2) for passing
a mask via CPU_SET(3), for example, to just select a subset of CPUs.
BPF samples code needs this fix as well (at least so that people stop
copying this). Thus, define bpf_num_possible_cpus() once in selftests
and import it from there for the sample code to avoid duplicating it.
The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
After all three issues are fixed, the test suite runs fine again:
# make run_tests | grep self
selftests: test_verifier [PASS]
selftests: test_maps [PASS]
selftests: test_lru_map [PASS]
selftests: test_kmod.sh [PASS]
[1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
Fixes:
3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes:
86af8b4191d2 ("Add sample for adding simple drop program to link")
Fixes:
df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes:
e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes:
ebb676daa1a3 ("bpf: Print function name in addition to function id")
Fixes:
5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:08 +0000 (01:28 +0100)]
bpf: allow for mount options to specify permissions
Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:07 +0000 (01:28 +0100)]
bpf: add owner_prog_type and accounted mem to array map's fdinfo
Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.
The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.
Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:06 +0000 (01:28 +0100)]
bpf: reuse dev_is_mac_header_xmit for redirect
Commit
dcf800344a91 ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:05 +0000 (01:28 +0100)]
bpf: drop useless bpf_fd member from cls/act
After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 26 Nov 2016 00:28:04 +0000 (01:28 +0100)]
bpf: drop unnecessary context cast from BPF_PROG_RUN
Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 25 Nov 2016 10:43:04 +0000 (13:43 +0300)]
sfc: remove unneeded variable
We don't use ->heap_buf after commit
46d1efd852cc ("sfc: remove Software
TSO") so let's remove the last traces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Nov 2016 01:26:59 +0000 (20:26 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.10
Major changes:
iwlwifi
* finalize and enable dynamic queue allocation
* use dev_coredumpmsg() to prevent locking the driver
* small fix to pass the AID to the FW
* use FW PS decisions with multi-queue
ath9k
* add device tree bindings
* switch to use mac80211 intermediate software queues to reduce
latency and fix bufferbloat
wl18xx
* allow scanning in AP mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 24 Nov 2016 05:04:08 +0000 (07:04 +0200)]
netdevice: fix sparse warning for HARD_TX_LOCK
sparse warns about context imbalance in any code
that uses HARD_TX_LOCK/UNLOCK - this is because it's
unable to determine that flags don't change so
lock and unlock are paired.
Seems easy enough to fix by adding __acquire/__release
calls.
With this patch af_packet.c is now sparse-clean,
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrik De Bie [Wed, 23 Nov 2016 20:11:04 +0000 (21:11 +0100)]
ptp: gianfar: Use high resolution frequency method.
This patch depends on commit
d8d263541913 ("ptp: Introduce a high
resolution frequency adjustment method.")
The gianfar devices offer a frequency resolution of about 0.46 ppb
(depends on actual value of tmr_add, for the calculation assumed
0x80000000). This patch lets users of the device benefit from the increased
frequency resolution when tuning the clock. Thanks to the rounding the
maximum error between the requested frequency and the applied frequency
will then be about 0.23 ppb.
Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
on net-next (currently at v4.9-rc5).
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 23 Nov 2016 17:46:52 +0000 (09:46 -0800)]
mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.
Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.
Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 27 Nov 2016 04:42:21 +0000 (23:42 -0500)]
Merge git://git./linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.
Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 27 Nov 2016 01:21:13 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs splice fix from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix default_file_splice_read()
Al Viro [Sun, 27 Nov 2016 01:05:42 +0000 (20:05 -0500)]
fix default_file_splice_read()
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 26 Nov 2016 23:28:34 +0000 (15:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here is a revert and two bugfixes for the I2C designware driver.
Please note that we are still hunting down a regression for the
i2c-octeon driver. While there is a fix pending, we have unclear
feedback from the testers currently. An rc8 would be quite helpful
for this case"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: designware: do not disable adapter after transfer"
i2c: designware: fix rx fifo depth tracking
i2c: designware: report short transfers
Linus Torvalds [Sat, 26 Nov 2016 23:26:20 +0000 (15:26 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"This resolves the ksyms issues by reverting the commit which
introduced the breakage"
There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
Revert "arm: move exports to definitions"
Linus Torvalds [Sat, 26 Nov 2016 21:05:05 +0000 (13:05 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix leak in fsl/fman driver, from Dan Carpenter.
2) Call flow dissector initcall earlier than any networking driver can
register and start to use it, from Eric Dumazet.
3) Some dup header fixes from Geliang Tang.
4) TIPC link monitoring compat fix from Jon Paul Maloy.
5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
Florian Fainelli.
6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
Mashak.
7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
tipc: resolve connection flow control compatibility problem
mvpp2: use correct size for memset
net/mlx5: drop duplicate header delay.h
net: ieee802154: drop duplicate header delay.h
ibmvnic: drop duplicate header seq_file.h
fsl/fman: fix a leak in tgec_free()
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
tipc: improve sanity check for received domain records
tipc: fix compatibility bug in link monitoring
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
dwc_eth_qos: drop duplicate headers
net sched filters: fix filter handle ID in tfilter_notify_chain()
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
bnxt: do not busy-poll when link is down
udplite: call proper backlog handlers
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
net/mlx4_en: Free netdev resources under state lock
net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
bnxt_en: Fix a VXLAN vs GENEVE issue
...
Linus Torvalds [Sat, 26 Nov 2016 20:24:47 +0000 (12:24 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Fix a crash that occurs at driver initialization if the memory region
is already busy (request_mem_region() fails).
- Fix a vma validation check that mistakenly allows a private device-
dax mapping to be established. Device-dax explicitly forbids private
mappings so it can guarantee a given fault granularity and backing
memory type.
Both of these fixes have soaked in -next and are tagged for -stable.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fail all private mapping attempts
device-dax: check devm_nsio_enable() return value
Linus Torvalds [Sat, 26 Nov 2016 20:18:59 +0000 (12:18 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"Four fixes for bugs found by syzkaller on x86, all for stable"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: check for pic and ioapic presence before use
KVM: x86: fix out-of-bounds accesses of rtc_eoi map
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
KVM: x86: fix out-of-bounds access in lapic
Linus Torvalds [Sat, 26 Nov 2016 19:24:03 +0000 (11:24 -0800)]
Merge tag 'powerpc-4.9-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Set missing wakeup bit in LPCR on POWER9
- Fix the early OPAL console wrappers
- Fixup kernel read only mapping
Fixes for code merged this cycle:
- Fix missing CRCs, add more asm-prototypes.h declarations"
* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fixup kernel read only mapping
powerpc/boot: Fix the early OPAL console wrappers
powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
powerpc: Set missing wakeup bit in LPCR on POWER9
Jon Paul Maloy [Thu, 24 Nov 2016 23:47:07 +0000 (18:47 -0500)]
tipc: resolve connection flow control compatibility problem
In commit
10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.
However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.
We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 26 Nov 2016 02:22:28 +0000 (21:22 -0500)]
Merge branch 'mlxsw-trap-groups-and-policers'
Jiri Pirko says:
====================
mlxsw: traps, trap groups and policers
Nogah says:
For a packet to be sent from the HW to the cpu, it needs to be trapped.
For a trap to be activate it should be assigned to a trap group.
Those trap groups can have policers, to limit the packet rate (the max
number of packets that can be sent to the cpu in a time slot, the rest
will be discarded) or the data rate (the same, but the count is not by the
number of packets but by their total length in bytes).
This patchset rearrange the trap setting API, re-write the traps and the
trap groups list in spectrum and assign them policers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:47 +0000 (10:33 +0100)]
mlxsw: spectrum: Add policers for trap groups
Configure policers and connect them to trap groups.
While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.
Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:46 +0000 (10:33 +0100)]
mlxsw: reg: Add QoS Policer Configuration Register
The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:45 +0000 (10:33 +0100)]
mlxsw: resources: Add max cpu policers resource
Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:44 +0000 (10:33 +0100)]
mlxsw: Create a different trap group list for each device
Trap groups can be used to control traps priority, both in terms of
which trap "wins" if a packet matches two traps (priority) and in terms
of packets from which trap group will be scheduled to the cpu first (tc).
They can also be used to set rate limiters (policers) on them (will be
added in the next patches).
Currently, we support two trap groups. In Spectrum we want a better
resolution, so every protocol / flow will have a different trap group,
so we can control its parameters separately. Once the policers will be
implemented, it will also allow us limit the rate of each protocol by
itself.
This patch change the trap group list to include:
* the emad trap group, which is shared for all the devices.
* Switchx2's trap groups, which are a copy of the current trap groups.
* Spectrum's new trap groups, in order to match the above guidelines.
(Switchib is using only the emad trap group, so it require no changes).
This patch also includes new configuration for Spectrum's trap groups,
with primary priority order within them.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:43 +0000 (10:33 +0100)]
mlxsw: spectrum: Add BGP trap
Add a trap for BGP protocol that was previously trapped by the generic trap
for IP2ME. This trap will allow us to have better control (over priority
and rate) of the traffic.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:42 +0000 (10:33 +0100)]
mlxsw: Change trap groups setting
Trap groups have many options which we currently set to default values.
In the next patches we will use many of them with non-default values.
Some of these options have no default value, so this patch sets them as
params for the trap group set function. Others almost always use the same
values, so the set function will use this default values. In the rare cases
when they will need to be with other values, these values can be set
directly (using the macros for fields in registers).
Parameters without default value:
TC - the traffic class for packets that hit this trap group.
(old default is the max tc)
priority - if one packet hits multiple trap groups, the group with the
higher priority will "catch" it. (old default is 0)
policer - limit rate policer (old default is disabled)
Default parameters:
swid - switch id, relevant for the emad trap only, ignored on Spectrum.
(new default is 0)
rdq - CPU receive descriptor queue (new default is identical to trap
group id)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:41 +0000 (10:33 +0100)]
mlxsw: resources: Add max trap groups resource
Add the max number of trap groups to resource query.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:40 +0000 (10:33 +0100)]
mlxsw: core: Change emad trap group settings
Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:39 +0000 (10:33 +0100)]
mlxsw: Add option to choose trap group
Currently, we set the trap group to pre-determined option, based on whether
it is an rx or event trap.
This commit adds a possibility to chose the trap group, so it can be set
to different values in the following patches.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:38 +0000 (10:33 +0100)]
mlxsw: Change trap set function
Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:37 +0000 (10:33 +0100)]
mlxsw: switchib: Use generic listener struct for events
Change the event handling in Switchib to be comptible with Spectrum and
Switchx2.
Use the generic listener struct for the events. Init and fini them by loop
(and not by calling each event by its name).
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:36 +0000 (10:33 +0100)]
mlxsw: switchx2: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:35 +0000 (10:33 +0100)]
mlxsw: spectrum: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:34 +0000 (10:33 +0100)]
mlxsw: core: Introduce generic macro for event
Create a macro for creating the generic listener struct for events,
similar to the one for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:33 +0000 (10:33 +0100)]
mlxsw: switchx2: Use generic listener struct for rx traps
Reorganize the traps to use the new generic listener struct and
functions. Use macros to shorten the traps list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:32 +0000 (10:33 +0100)]
mlxsw: spectrum: Use generic listener struct for rx traps
Replace the old rx listener struct definitions by the generic ones.
Use the new generic registering / unregistering functions for them.
Add some macros to organize the trap list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:31 +0000 (10:33 +0100)]
mlxsw: core: Expose generic macros for rx trap
In Spectrum, there is a macro to arrange the traps list.
This macro is useful for everyone who is using rx traps.
Create a similar macro in core.h for creating the generic listener struct
for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:30 +0000 (10:33 +0100)]
mlxsw: core: Create a generic function to register / unregister traps
We have 2 types of HW traps to handle, rx traps and events.
The registration workflow for both is very similar. So it only make
sense to create one function to handle both.
This patch creates a struct to hold the data for both cases. It also
creates a registration and an un-registration functions that get this
generic struct as input.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Fri, 25 Nov 2016 09:33:29 +0000 (10:33 +0100)]
mlxsw: spectrum: Remove unused traps
Since commit
99724c18fc66 ("mlxsw: spectrum: Introduce support for
router interfaces") we no longer rely on flooding traffic to the CPU in
order to trap packets intended for the host itself. Therefore, the FDB
MC trap can be removed.
Remove traps for protocols that are not supported yet.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 24 Nov 2016 16:28:12 +0000 (17:28 +0100)]
mvpp2: use correct size for memset
gcc-7 detects a short memset in mvpp2, introduced in the original
merge of the driver:
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_cls_init':
drivers/net/ethernet/marvell/mvpp2.c:3296:2: error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
The result seems to be that we write uninitialized data into the
flow table registers, although we did not get any warning about
that uninitialized data usage.
Using sizeof() lets us initialize then entire array instead.
Fixes:
3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:33 +0000 (21:58 +0800)]
net/mlx5: drop duplicate header delay.h
Drop duplicate header delay.h from mlx5/core/main.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:32 +0000 (21:58 +0800)]
net: ieee802154: drop duplicate header delay.h
Drop duplicate header delay.h from
adf7242.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Thu, 24 Nov 2016 13:58:29 +0000 (21:58 +0800)]
ibmvnic: drop duplicate header seq_file.h
Drop duplicate header seq_file.h from ibmvnic.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 24 Nov 2016 11:20:43 +0000 (14:20 +0300)]
fsl/fman: fix a leak in tgec_free()
We set "tgec->cfg" to NULL before passing it to kfree(). There is no
need to set it to NULL at all. Let's just delete it.
Fixes:
57ba4c9b56d8 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 24 Nov 2016 11:03:45 +0000 (14:03 +0300)]
net/mlx5: remove a duplicate condition
We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first
line of the function so we don't need to check again here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Thu, 24 Nov 2016 09:55:06 +0000 (10:55 +0100)]
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
command and likewise it shouldn't require the CAP_NET_ADMIN capability.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 26 Nov 2016 01:21:24 +0000 (20:21 -0500)]
Merge branch 'thunderx-new-features'
Sunil Goutham says:
====================
net: thunderx: Support for 80xx, RED, PFC e.t.c
This patch series adds support for SLM modules present on 80xx
silicon, enables ramdom early discard, backpressure generation,
PFC and some ethtool changes to display supported link modes e.t.c.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 24 Nov 2016 09:18:03 +0000 (14:48 +0530)]
net: thunderx: Pause frame support
Enable pause frames on both Rx and Tx side, configure pause
interval e.t.c. Also support for enable/disable pause frames
on Rx/Tx via ethtool has been added.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 24 Nov 2016 09:18:02 +0000 (14:48 +0530)]
net: thunderx: Configure RED and backpressure levels
This patch enables moving average calculation of Rx pkt's resources
and configures RED and backpressure levels for both CQ and RBDR.
Also initialize SQ's CQ_LIMIT properly.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanneeru Srinivasulu [Thu, 24 Nov 2016 09:18:01 +0000 (14:48 +0530)]
net: thunderx: Add ethtool support for supported ports and link modes.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 24 Nov 2016 09:18:00 +0000 (14:48 +0530)]
net: thunderx: 80xx BGX0 configuration changes
On 80xx only one lane of DLM0 and DLM1 (of BGX0) can be used
, so even though lmac count may be 2 but LMAC1 should use
serdes lane of DLM1. Since it's not possible to distinguish
80xx from 81xx as PCI devid are same, this patch adds this
config support by replying on what firmware configures the
lmacs with.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 24 Nov 2016 04:46:09 +0000 (23:46 -0500)]
tipc: improve sanity check for received domain records
In commit
35c55c9877f8 ("tipc: add neighbor monitoring framework") we
added a data area to the link monitor STATE messages under the
assumption that previous versions did not use any such data area.
For versions older than Linux 4.3 this assumption is not correct. In
those version, all STATE messages sent out from a node inadvertently
contain a 16 byte data area containing a string; -a leftover from
previous RESET messages which were using this during the setup phase.
This string serves no purpose in STATE messages, and should no be there.
Unfortunately, this data area is delivered to the link monitor
framework, where a sanity check catches that it is not a correct domain
record, and drops it. It also issues a rate limited warning about the
event.
Since such events occur much more frequently than anticipated, we now
choose to remove the warning in order to not fill the kernel log with
useless contents. We also make the sanity check stricter, to further
reduce the risk that such data is inavertently admitted.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 24 Nov 2016 02:05:26 +0000 (21:05 -0500)]
tipc: fix compatibility bug in link monitoring
commit
817298102b0b ("tipc: fix link priority propagation") introduced a
compatibility problem between TIPC versions newer than Linux 4.6 and
those older than Linux 4.4. In versions later than 4.4, link STATE
messages only contain a non-zero link priority value when the sender
wants the receiver to change its priority. This has the effect that the
receiver resets itself in order to apply the new priority. This works
well, and is consistent with the said commit.
However, in versions older than 4.4 a valid link priority is present in
all sent link STATE messages, leading to cyclic link establishment and
reset on the 4.6+ node.
We fix this by adding a test that the received value should not only
be valid, but also differ from the current value in order to cause the
receiving link endpoint to reset.
Reported-by: Amar Nv <amar.nv005@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung Huh [Wed, 23 Nov 2016 23:10:33 +0000 (23:10 +0000)]
phy: fix error case of phy_led_triggers_(un)register
When phy_init_hw() fails at phy_attach_direct();
- phy_detach() calls phy_led_triggers_unregister() without
previous call of phy_led_triggers_register().
- still call phy_led_triggers_register() and cause memory leak.
Fixes:
2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 23 Nov 2016 23:08:13 +0000 (00:08 +0100)]
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
The mvneta driver advertises it supports IFF_UNICAST_FLT. However, it
actually does not. The hardware probably does support it, but there is
no code to configure the filter. As a quick and simple fix, remove the
flag. This will cause the core to fall back to promiscuous mode.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes:
b50b72de2f2f ("net: mvneta: enable features before registering the driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 26 Nov 2016 00:47:15 +0000 (16:47 -0800)]
Merge branch 'parisc-4.9-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"On parisc we were still seeing occasional random segmentation faults
and memory corruption on SMP machines. Dave Anglin then looked again
at the TLB related code and found two issues in the PCI DMA and
generic TLB flush functions.
Then, in our startup code we had some timing of the cache and TLB
functions to calculate a threshold when to use a complete TLB/cache
flush or just to flush a specific range. This code produced a race
with newly started CPUs and thus lead to occasional kernel crashes
(due to stale TLB/cache entries). The patch by Dave fixes this issue
by flushing the local caches before starting secondary CPUs and by
removing the race.
The last problem fixed by this series is that we quite often suffered
from hung tasks and self-detected stalls on the CPUs. It was somehow
clear that this was related to the (in v4.7) newly introduced cr16
clocksource and the own implementation of sched_clock(). I replaced
the open-coded sched_clock() function and switched to the generic
sched_clock() implementation which seems to have fixed this isse as
well.
All patches have been sucessfully tested on a variety of machines,
including our debian buildd servers.
All patches (beside the small pr_cont fix) are tagged for stable
releases"
* 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Also flush data TLB in flush_icache_page_asm
parisc: Fix race in pci-dma.c
parisc: Switch to generic sched_clock implementation
parisc: Fix races in parisc_setup_cache_timing()
parisc: Fix printk continuations in system detection
Eric Dumazet [Wed, 23 Nov 2016 16:44:56 +0000 (08:44 -0800)]
net: properly flush delay-freed skbs
Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
This put skb in a percpu special queue, napi_alloc_cache, to get bulk
frees.
It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
limit quite often, with skbs that were queued hundreds of usec earlier.
I measured this can take ~6000 nsec to perform one flush.
__kfree_skb_flush() can be called from two points right now :
1) From net_tx_action(), but only for skbs that were queued to
sd->completion_queue.
-> Irrelevant for NAPI drivers in normal operation.
2) From net_rx_action(), but only under high stress or if RPS/RFS has a
pending action.
This patch changes net_rx_action() to perform the flush in all cases and
after more urgent operations happened (like kicking remote CPUS for
RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 25 Nov 2016 23:53:45 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:
"From David:
- Fix mpi_powm()'s handling of a number with a zero exponent
[CVE-2016-8650].
Integrate my and Andrey's patches for mpi_powm() and use
mpi_resize() instead of RESIZE_IF_NEEDED() - the latter adds a
duplicate check into the execution path of a trivial case we
don't normally expect to be taken.
- Fix double free in X.509 error handling"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
X.509: Fix double free in x509_cert_parse() [ver #3]
Linus Torvalds [Fri, 25 Nov 2016 23:44:47 +0000 (15:44 -0800)]
Fix subtle CONFIG_MODVERSIONS problems
CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series,
and quite frankly, nobody has cared very deeply. We absolutely know how
to fix it, and it's not _complicated_, but it's not exactly pretty
either.
This oneliner fixes it without the ugliness, and allows for further
future cleanups.
"We've secretly replaced their regular MODVERSIONS with nothing at
all, let's see if they notice"
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 25 Nov 2016 23:16:51 +0000 (15:16 -0800)]
Merge tag 'acpi-4.9-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Two ACPI fixes for 4.9-rc7.
One of them reverts a recent ACPI commit that attempted to improve
reboot/power-off on some systems, but introduced problems elsewhere,
and the other one fixes kernel builds with the new WDAT watchdog
driver enabled in some configurations.
Specifics:
- Revert the recent commit that caused the ACPI _PTS method to be
executed in the power-off/reboot code path (as per the
specification) in an attempt to improve things on some systems
(apparently expecting _PTS to be executed in that code path), but
broke power-off/reboot on at least one other machine (Rafael
Wysocki).
- Fix kernel builds with the new WDAT watchdog driver enabled in some
configurations by explicitly selecting WATCHDOG_CORE when enabling
the WDAT watchdog driver (Mika Westerberg)"
* tag 'acpi-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
watchdog: wdat_wdt: Select WATCHDOG_CORE
Revert "ACPI: Execute _PTS before system reboot"
Rafael J. Wysocki [Thu, 24 Nov 2016 23:13:56 +0000 (00:13 +0100)]
MAINTAINERS: Add bug tracking system location entry type
Following the kernel Bugzilla discussion during the Kernel Summit
(https://lwn.net/Articles/705245/), add bug tracking system location
entry type (B) to MAINTAINERS and populate it for several subsystems
known to be using the kernel BZ actively (and add the upstream BZ for
ACPICA too).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jarkko Nikula [Fri, 25 Nov 2016 15:22:27 +0000 (17:22 +0200)]
Revert "i2c: designware: do not disable adapter after transfer"
This reverts commit
0317e6c0f1dc1ba86b8d9dccc010c5e77b8355fa.
Srinivas reported recently touchscreen and touchpad stopped working in
Haswell based machine in Linux 4.9-rc series with timeout errors from
i2c_designware:
[ 16.508013] i2c_designware INT33C3:00: controller timed out
[ 16.508302] i2c_hid i2c-MSFT0001:02: failed to change power setting.
[ 17.532016] i2c_designware INT33C3:00: controller timed out
[ 18.556022] i2c_designware INT33C3:00: controller timed out
[ 18.556315] i2c_hid i2c-ATML1000:00: failed to retrieve report from device.
I managed to reproduce similar errors on another Haswell based machine
where touchscreen initialization fails maybe in every 1/5 - 1/2 boots.
Since root cause for these errors is not clear yet and debugging is
ongoing it's better to revert this commit as we are near to release.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
David S. Miller [Fri, 25 Nov 2016 21:26:12 +0000 (16:26 -0500)]
Merge branch 'cgroup-bpf'
Daniel Mack says:
====================
Add eBPF hooks for cgroups
This is v9 of the patch set to allow eBPF programs for network
filtering and accounting to be attached to cgroups, so that they apply
to all sockets of all tasks placed in that cgroup. The logic also
allows to be extendeded for other cgroup based eBPF logic.
Again, only minor details are updated in this version.
Changes from v8:
* Move the egress hooks into ip_finish_output() and ip6_finish_output()
so they run after the netfilter hooks. For IPv4 multicast, add a new
ip_mc_finish_output() callback that is invoked on success by
netfilter, and call the eBPF program from there.
Changes from v7:
* Replace the static inline function cgroup_bpf_run_filter() with
two specific macros for ingress and egress. This addresses David
Miller's concern regarding skb->sk vs. sk in the egress path.
Thanks a lot to Daniel Borkmann and Alexei Starovoitov for the
suggestions.
Changes from v6:
* Rebased to 4.9-rc2
* Add EXPORT_SYMBOL(__cgroup_bpf_run_filter). The kbuild test robot
now succeeds in building this version of the patch set.
* Switch from bpf_prog_run_save_cb() to bpf_prog_run_clear_cb() to not
tamper with the contents of skb->cb[]. Pointed out by Daniel
Borkmann.
* Use sk_to_full_sk() in the egress path, as suggested by Daniel
Borkmann.
* Renamed BPF_PROG_TYPE_CGROUP_SOCKET to BPF_PROG_TYPE_CGROUP_SKB, as
requested by David Ahern.
* Added Alexei's Acked-by tags.
Changes from v5:
* The eBPF programs now operate on L3 rather than on L2 of the packets,
and the egress hooks were moved from __dev_queue_xmit() to
ip*_output().
* For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb
through BPF_LD_[ABS|IND] instructions, but hook up the
bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann
for the help.
Changes from v4:
* Plug an skb leak when dropping packets due to eBPF verdicts in
__dev_queue_xmit(). Spotted by Daniel Borkmann.
* Check for sk_fullsock(sk) in __cgroup_bpf_run_filter() so we don't
operate on timewait or request sockets. Suggested by Daniel Borkmann.
* Add missing @parent parameter in kerneldoc of __cgroup_bpf_update().
Spotted by Rami Rosen.
* Include linux/jump_label.h from bpf-cgroup.h to fix a kbuild error.
Changes from v3:
* Dropped the _FILTER suffix from BPF_PROG_TYPE_CGROUP_SOCKET_FILTER,
renamed BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS to
BPF_CGROUP_INET_{IN,E}GRESS and alias BPF_MAX_ATTACH_TYPE to
__BPF_MAX_ATTACH_TYPE, as suggested by Daniel Borkmann.
* Dropped the attach_flags member from the anonymous struct for BPF
attach operations in union bpf_attr. They can be added later on via
CHECK_ATTR. Requested by Daniel Borkmann and Alexei.
* Release old_prog at the end of __cgroup_bpf_update rather that at
the beginning to fix a race gap between program updates and their
users. Spotted by Daniel Borkmann.
* Plugged an skb leak when dropping packets on the egress path.
Spotted by Daniel Borkmann.
* Add cgroups@vger.kernel.org to the loop, as suggested by Rami Rosen.
* Some minor coding style adoptions not worth mentioning in particular.
Changes from v2:
* Fixed the RCU locking details Tejun pointed out.
* Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler.
Changes from v1:
* Moved all bpf specific cgroup code into its own file, and stub
out related functions for !CONFIG_CGROUP_BPF as static inline nops.
This way, the call sites are not cluttered with #ifdef guards while
the feature remains compile-time configurable.
* Implemented the new scheme proposed by Tejun. Per cgroup, store one
set of pointers that are pinned to the cgroup, and one for the
programs that are effective. When a program is attached or detached,
the change is propagated to all the cgroup's descendants. If a
subcgroup has its own pinned program, skip the whole subbranch in
order to allow delegation models.
* The hookup for egress packets is now done from __dev_queue_xmit().
* A static key is now used in both the ingress and egress fast paths
to keep performance penalties close to zero if the feature is
not in use.
* Overall cleanup to make the accessors use the program arrays.
This should make it much easier to add new program types, which
will then automatically follow the pinned vs. effective logic.
* Fixed locking issues, as pointed out by Eric Dumazet and Alexei
Starovoitov. Changes to the program array are now done with
xchg() and are protected by cgroup_mutex.
* eBPF programs are now expected to return 1 to let the packet pass,
not >= 0. Pointed out by Alexei.
* Operation is now limited to INET sockets, so local AF_UNIX sockets
are not affected. The enum members are renamed accordingly. In case
other socket families should be supported, this can be extended in
the future.
* The sample program learned to support both ingress and egress, and
can now optionally make the eBPF program drop packets by making it
return 0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:30 +0000 (16:52 +0100)]
samples: bpf: add userspace example for attaching eBPF programs to cgroups
Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:
* Create arraymap in kernel with 4 byte keys and 8 byte values
* Load eBPF program
The eBPF program accesses the map passed in to store two pieces of
information. The number of invocations of the program, which maps
to the number of packets received, is stored to key 0. Key 1 is
incremented on each iteration by the number of bytes stored in
the skb.
* Detach any eBPF program previously attached to the cgroup
* Attach the new program to the cgroup using BPF_PROG_ATTACH
* Once a second, read map[0] and map[1] to see how many bytes and
packets were seen on any socket of tasks in the given cgroup.
The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.
libbpf gained two new wrappers for the new syscall commands.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:29 +0000 (16:52 +0100)]
net: ipv4, ipv6: run cgroup eBPF egress programs
If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from ip_output(), ip6_output() and
ip_mc_output(). From mentioned functions we have two socket contexts
as per
7026b1ddb6b8 ("netfilter: Pass socket pointer down through
okfn()."). We explicitly need to use sk instead of skb->sk here,
since otherwise the same program would run multiple times on egress
when encap devices are involved, which is not desired in our case.
eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the skb through bpf_skb_load_bytes(), and the payload starts at the
network headers (L3).
Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:28 +0000 (16:52 +0100)]
net: filter: run cgroup eBPF ingress programs
If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from sk_filter_trim_cap().
eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the skb through bpf_skb_load_bytes(), and the payload starts at the
network headers (L3).
Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:27 +0000 (16:52 +0100)]
bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
BPF_PROG_DETACH which allow attaching and detaching eBPF programs
to a target.
On the API level, the target could be anything that has an fd in
userspace, hence the name of the field in union bpf_attr is called
'target_fd'.
When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
expected to be a valid file descriptor of a cgroup v2 directory which
has the bpf controller enabled. These are the only use-cases
implemented by this patch at this point, but more can be added.
If a program of the given type already exists in the given cgroup,
the program is swapped automically, so userspace does not have to drop
an existing program first before installing a new one, which would
otherwise leave a gap in which no program is attached.
For more information on the propagation logic to subcgroups, please
refer to the bpf cgroup controller implementation.
The API is guarded by CAP_NET_ADMIN.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:26 +0000 (16:52 +0100)]
cgroup: add support for eBPF programs
This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.
To illustrate the logic behind that, assume the following example
cgroup hierarchy.
A - B - C
\ D - E
If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.
Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Wed, 23 Nov 2016 15:52:25 +0000 (16:52 +0100)]
bpf: add new prog type for cgroup socket filtering
This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that
it does not allow BPF_LD_[ABS|IND] instructions and hooks up the
bpf_skb_load_bytes() helper.
Programs of this type will be attached to cgroups for network filtering
and accounting.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Fri, 25 Nov 2016 21:24:07 +0000 (22:24 +0100)]
Merge branches 'acpi-sleep-fixes' and 'acpi-wdat-fixes'
* acpi-sleep-fixes:
Revert "ACPI: Execute _PTS before system reboot"
* acpi-wdat-fixes:
watchdog: wdat_wdt: Select WATCHDOG_CORE
David S. Miller [Fri, 25 Nov 2016 21:17:12 +0000 (16:17 -0500)]
Merge tag 'linux-can-fixes-for-4.9-
20161123' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-11-23
this is a pull request for net/master.
The patch by Oliver Hartkopp for the broadcast manager (bcm) fixes the
CAN-FD support, which may cause an out-of-bounds access otherwise.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Wed, 23 Nov 2016 14:24:35 +0000 (22:24 +0800)]
dwc_eth_qos: drop duplicate headers
Drop duplicate headers types.h and delay.h from dwc_eth_qos.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 23 Nov 2016 11:02:44 +0000 (11:02 +0000)]
cxgb4: fix memory leak on txq_info
Currently if txq_info->uldtxq cannot be allocated then
txq_info->txq is being kfree'd (which is redundant because it
is NULL) instead of txq_info. Fix this by instead kfree'ing
txq_info.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 25 Nov 2016 19:36:35 +0000 (11:36 -0800)]
Merge tag 'mfd-fixes-4.9.1' of git://git./linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
"Received a copule of last minute fixes for v4.9.
The patches from Viresh are fixing issues displayed in KernelCI"
* tag 'mfd-fixes-4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: wm8994-core: Don't use managed regulator bulk get API
mfd: wm8994-core: Disable regulators before removing them
mfd: syscon: Support native-endian regmaps
Linus Torvalds [Fri, 25 Nov 2016 19:31:01 +0000 (11:31 -0800)]
Merge tag 'media/v4.9-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"Fix for the firmware load logic of the tuner-xc2028 driver"
* tag 'media/v4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
xc2028: Fix use-after-free bug properly
Linus Torvalds [Fri, 25 Nov 2016 18:51:35 +0000 (10:51 -0800)]
Merge tag 'drm-fixes-for-v4.9-rc7' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Seems to be quietening down nicely, a few mediatek, one exynos and one
hdlcd fix, along with two amd fixes"
* tag 'drm-fixes-for-v4.9-rc7' of git://people.freedesktop.org/~airlied/linux:
gpu/drm/exynos/exynos_hdmi - Unmap region obtained by of_iomap
drm/mediatek: fix null pointer dereference
drm/mediatek: fixed the calc method of data rate per lane
drm/mediatek: fix a typo of DISP_OD_CFG to OD_RELAYMODE
drm/radeon: fix power state when port pm is unavailable (v2)
drm/amdgpu: fix power state when port pm is unavailable
drm/arm: hdlcd: fix plane base address update
drm/amd/powerplay: avoid out of bounds access on array ps.
John David Anglin [Fri, 25 Nov 2016 01:18:14 +0000 (20:18 -0500)]
parisc: Also flush data TLB in flush_icache_page_asm
This is the second issue I noticed in reviewing the parisc TLB code.
The fic instruction may use either the instruction or data TLB in
flushing the instruction cache. Thus, on machines with a split TLB, we
should also flush the data TLB after setting up the temporary alias
registers.
Although this has no functional impact, I changed the pdtlb and pitlb
instructions to consistently use the index register %r0. These
instructions do not support integer displacements.
Tested on rp3440 and c8000.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Fri, 25 Nov 2016 01:06:32 +0000 (20:06 -0500)]
parisc: Fix race in pci-dma.c
We are still troubled by occasional random segmentation faults and
memory memory corruption on SMP machines. The causes quite a few
package builds to fail on the Debian buildd machines for parisc. When
gcc-6 failed to build three times in a row, I looked again at the TLB
related code. I found a couple of issues. This is the first.
In general, we need to ensure page table updates and corresponding TLB
purges are atomic. The attached patch fixes an instance in pci-dma.c
where the page table update was not guarded by the TLB lock.
Tested on rp3440 and c8000. So far, no further random segmentation
faults have been observed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Tue, 22 Nov 2016 17:08:30 +0000 (18:08 +0100)]
parisc: Switch to generic sched_clock implementation
Drop the open-coded sched_clock() function and replace it by the provided
GENERIC_SCHED_CLOCK implementation. We have seen quite some hung tasks in the
past, which seem to be fixed by this patch.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Mon, 21 Nov 2016 02:12:36 +0000 (21:12 -0500)]
parisc: Fix races in parisc_setup_cache_timing()
Helge reported to me the following startup crash:
[ 0.000000] Linux version 4.8.0-1-parisc64-smp (debian-kernel@lists.debian.org) (gcc version 5.4.1
20161019 (GCC) ) #1 SMP Debian 4.8.7-1 (2016-11-13)
[ 0.000000] The 64-bit Kernel has started...
[ 0.000000] Kernel default page size is 4 KB. Huge pages enabled with 1 MB physical and 2 MB virtual size.
[ 0.000000] Determining PDC firmware type: System Map.
[ 0.000000] model 9000/785/J5000
[ 0.000000] Total Memory: 2048 MB
[ 0.000000] Memory: 2018528K/2097152K available (9272K kernel code, 3053K rwdata, 1319K rodata, 1024K init, 840K bss, 78624K reserved, 0K cma-reserved)
[ 0.000000] virtual kernel memory layout:
[ 0.000000] vmalloc : 0x0000000000008000 - 0x000000003f000000 (1007 MB)
[ 0.000000] memory : 0x0000000040000000 - 0x00000000c0000000 (2048 MB)
[ 0.000000] .init : 0x0000000040100000 - 0x0000000040200000 (1024 kB)
[ 0.000000] .data : 0x0000000040b0e000 - 0x0000000040f533e0 (4372 kB)
[ 0.000000] .text : 0x0000000040200000 - 0x0000000040b0e000 (9272 kB)
[ 0.768910] Brought up 1 CPUs
[ 0.992465] NET: Registered protocol family 16
[ 2.429981] Releasing cpu 1 now, hpa=
fffffffffffa2000
[ 2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
[ 2.726692] Setting cache flush threshold to 1024 kB
[ 2.729932] Not-handled unaligned insn 0x43ffff80
[ 2.798114] Setting TLB flush threshold to 140 kB
[ 2.928039] Unaligned handler failed, ret = -1
[ 3.000419] _______________________________
[ 3.000419] < Your System ate a SPARC! Gah! >
[ 3.000419] -------------------------------
[ 3.000419] \ ^__^
[ 3.000419] (__)\ )\/\
[ 3.000419] U ||----w |
[ 3.000419] || ||
[ 9.340055] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
[ 9.448082] task:
00000000bfd48060 task.stack:
00000000bfd50000
[ 9.528040]
[ 10.760029] IASQ:
0000000000000000 0000000000000000 IAOQ:
000000004025d154 000000004025d158
[ 10.868052] IIR:
43ffff80 ISR:
0000000000340000 IOR:
000001ff54150960
[ 10.960029] CPU: 1 CR30:
00000000bfd50000 CR31:
0000000011111111
[ 11.052057] ORIG_R28:
000000004021e3b4
[ 11.100045] IAOQ[0]: irq_exit+0x94/0x120
[ 11.152062] IAOQ[1]: irq_exit+0x98/0x120
[ 11.208031] RP(r2): irq_exit+0xb8/0x120
[ 11.256074] Backtrace:
[ 11.288067] [<
00000000402cd944>] cpu_startup_entry+0x1e4/0x598
[ 11.368058] [<
0000000040109528>] smp_callin+0x2c0/0x2f0
[ 11.436308] [<
00000000402b53fc>] update_curr+0x18c/0x2d0
[ 11.508055] [<
00000000402b73b8>] dequeue_entity+0x2c0/0x1030
[ 11.584040] [<
00000000402b3cc0>] set_next_entity+0x80/0xd30
[ 11.660069] [<
00000000402c1594>] pick_next_task_fair+0x614/0x720
[ 11.740085] [<
000000004020dd34>] __schedule+0x394/0xa60
[ 11.808054] [<
000000004020e488>] schedule+0x88/0x118
[ 11.876039] [<
0000000040283d3c>] rescuer_thread+0x4d4/0x5b0
[ 11.948090] [<
000000004028fc4c>] kthread+0x1ec/0x248
[ 12.016053] [<
0000000040205020>] end_fault_vector+0x20/0xc0
[ 12.092239] [<
00000000402050c0>] _switch_to_ret+0x0/0xf40
[ 12.164044]
[ 12.184036] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
[ 12.244040] Backtrace:
[ 12.244040] [<
000000004021c480>] show_stack+0x68/0x80
[ 12.244040] [<
00000000406f332c>] dump_stack+0xec/0x168
[ 12.244040] [<
000000004021c74c>] die_if_kernel+0x25c/0x430
[ 12.244040] [<
000000004022d320>] handle_unaligned+0xb48/0xb50
[ 12.244040]
[ 12.632066] ---[ end trace
9ca05a7215c7bbb2 ]---
[ 12.692036] Kernel panic - not syncing: Attempted to kill the idle task!
We have the insn 0x43ffff80 in IIR but from IAOQ we should have:
4025d150: 0f f3 20 df ldd,s r19(r31),r31
4025d154: 0f 9f 00 9c ldw r31(ret0),ret0
4025d158: bf 80 20 58 cmpb,*<> r0,ret0,
4025d18c <irq_exit+0xcc>
Cpu0 has just completed running parisc_setup_cache_timing:
[ 2.429981] Releasing cpu 1 now, hpa=
fffffffffffa2000
[ 2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
[ 2.726692] Setting cache flush threshold to 1024 kB
[ 2.729932] Not-handled unaligned insn 0x43ffff80
[ 2.798114] Setting TLB flush threshold to 140 kB
[ 2.928039] Unaligned handler failed, ret = -1
From the backtrace, cpu1 is in smp_callin:
void __init smp_callin(void)
{
int slave_id = cpu_now_booting;
smp_cpu_init(slave_id);
preempt_disable();
flush_cache_all_local(); /* start with known state */
flush_tlb_all_local(NULL);
local_irq_enable(); /* Interrupts have been off until now */
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
So, it has just flushed its caches and the TLB. It would seem either the
flushes in parisc_setup_cache_timing or smp_callin have corrupted kernel
memory.
The attached patch reworks parisc_setup_cache_timing to remove the races
in setting the cache and TLB flush thresholds. It also corrects the
number of bytes flushed in the TLB calculation.
The patch flushes the cache and TLB on cpu0 before starting the
secondary processors so that they are started from a known state.
Tested with a few reboots on c8000.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Helge Deller <deller@gmx.de>