Kristian Evensen [Thu, 21 Jul 2016 09:10:06 +0000 (11:10 +0200)]
cdc_ether: Improve ZTE MF823/831/910 handling
The firmware in several ZTE devices (at least the MF823/831/910
modems/mifis) use OS fingerprinting to determine which type of device to
export. In addition, these devices export a REST API which can be used to
control the type of device. So far, on Linux, the devices have been seen as
RNDIS or CDC Ether.
When CDC Ether is used, devices of the same type are, as with RNDIS,
exported with the same, bogus random MAC address. In addition, the devices
(at least on all firmware revisions I have found) use the bogus MAC when
sending traffic routed from external networks. And as a final feature, the
devices sometimes export the link state incorrectly. There are also
references online to several other ZTE devices displaying this behavior,
with several different PIDs and MAC addresses.
This patch tries to improve the handling of ZTE devices by doing the
following:
* Create a new driver_info-struct that is used by ZTE devices that do not
have an explicit entry in the product table. This struct is the same as the
default cdc_ether driver info, but a new bind- and an rx_fixup-function
have been added.
* In the new bind function, we check if we have read a random MAC from the
device. If we have, then we generate a new random MAC address. This will
ensure that all devices get a unique MAC.
* The rx_fixup-function replaces the destination MAC address in the skb
with that of the device. I have not seen a revision of these devices that
behaves correctly (i.e., sets the right destination MAC), so I chose not to
do any comparison with for example the known, bogus addresses.
* The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect
(the correct behavior is off then on). Work around this by manually setting
carrier to off if an on-notification is received and the NOCARRIER-bit is
not set.
This change will affect all devices, but it should take care of similar
mistakes made by other manufacturers. I tried to think of/look/test for
problems/regressions that could be introduced by this behavior, but could
not find any. However, my familiarity with this code path is not that
great, so there could be something I have overlooked.
I have tested this patch with multiple revisions of all three devices, and
they behave as expected. In other words, they all got a valid, random MAC,
the correct operational state and I can receive/sent traffic without
problems. I also tested with some other cdc_ether devices I have and did
not find any problems/regressions caused by the two general changes.
v3->v4:
* Forgot to remove unused variables, sorry about that (thanks David
Miller).
v2->v3:
* I had forgot to remove the random MAC generation from usbnet_cdc_bind()
(thanks Oliver).
* Rework logic in the ZTE bind-function a bit.
v1->v2:
* Only generate random MAC for ZTE devices (thanks Oliver Neukum).
* Set random MAC and do RX fixup for all ZTE devices that do not have a
product-entry, as the bogus MAC have been seen on devices with several
different PIDs/MAC addresses. In other words, it seems to be the default
behavior of ZTE CDC Ether devices (thanks Lars Melin).
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Jul 2016 05:02:36 +0000 (22:02 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next,
they are:
1) Count pre-established connections as active in "least connection"
schedulers such that pre-established connections to avoid overloading
backend servers on peak demands, from Michal Kubecek via Simon Horman.
2) Address a race condition when resizing the conntrack table by caching
the bucket size when fulling iterating over the hashtable in these
three possible scenarios: 1) dump via /proc/net/nf_conntrack,
2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
From Liping Zhang.
3) Revisit early_drop() path to perform lockless traversal on conntrack
eviction under stress, use del_timer() as synchronization point to
avoid two CPUs evicting the same entry, from Florian Westphal.
4) Move NAT hlist_head to nf_conn object, this simplifies the existing
NAT extension and it doesn't increase size since recent patches to
align nf_conn, from Florian.
5) Use rhashtable for the by-source NAT hashtable, also from Florian.
6) Don't allow --physdev-is-out from OUTPUT chain, just like
--physdev-out is not either, from Hangbin Liu.
7) Automagically set on nf_conntrack counters if the user tries to
match ct bytes/packets from nftables, from Liping Zhang.
8) Remove possible_net_t fields in nf_tables set objects since we just
simply pass the net pointer to the backend set type implementations.
9) Fix possible off-by-one in h323, from Toby DiPasquale.
10) early_drop() may be called from ctnetlink patch, so we must hold
rcu read size lock from them too, this amends Florian's patch #3
coming in this batch, from Liping Zhang.
11) Use binary search to validate jump offset in x_tables, this
addresses the O(n!) validation that was introduced recently
resolve security issues with unpriviledge namespaces, from Florian.
12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.
13) Three updates for nft_log: Fix log prefix leak in error path. Bail
out on loglevel larger than debug in nft_log and set on the new
NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.
14) Allow to filter rule dumps in nf_tables based on table and chain
names.
15) Simplify connlabel to always use 128 bits to store labels and
get rid of unused function in xt_connlabel, from Florian.
16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
helper, by Gao Feng.
17) Put back x_tables module reference in nft_compat on error, from
Liping Zhang.
18) Add a reference count to the x_tables extensions cache in
nft_compat, so we can remove them when unused and avoid a crash
if the extensions are rmmod, again from Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Jul 2016 23:31:37 +0000 (19:31 -0400)]
Merge git://git./linux/kernel/git/davem/net
Just several instances of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Liping Zhang [Sat, 23 Jul 2016 08:00:32 +0000 (16:00 +0800)]
netfilter: nft_compat: fix crash when related match/target module is removed
We "cache" the loaded match/target modules and reuse them, but when the
modules are removed, we still point to them. Then we may end up with
invalid memory references when using iptables-compat to add rules later.
Input the following commands will reproduce the kernel crash:
# iptables-compat -A INPUT -j LOG
# iptables-compat -D INPUT -j LOG
# rmmod xt_LOG
# iptables-compat -A INPUT -j LOG
BUG: unable to handle kernel paging request at
ffffffffa05a9010
IP: [<
ffffffff813f783e>] strcmp+0xe/0x30
Call Trace:
[<
ffffffffa05acc43>] nft_target_select_ops+0x83/0x1f0 [nft_compat]
[<
ffffffffa058a177>] nf_tables_expr_parse+0x147/0x1f0 [nf_tables]
[<
ffffffffa058e541>] nf_tables_newrule+0x301/0x810 [nf_tables]
[<
ffffffff8141ca00>] ? nla_parse+0x20/0x100
[<
ffffffffa057fa8f>] nfnetlink_rcv+0x33f/0x53d [nfnetlink]
[<
ffffffffa057f94b>] ? nfnetlink_rcv+0x1fb/0x53d [nfnetlink]
[<
ffffffff817116b8>] netlink_unicast+0x178/0x220
[<
ffffffff81711a5b>] netlink_sendmsg+0x2fb/0x3a0
[<
ffffffff816b7fc8>] sock_sendmsg+0x38/0x50
[<
ffffffff816b8a7e>] ___sys_sendmsg+0x28e/0x2a0
[<
ffffffff816bcb7e>] ? release_sock+0x1e/0xb0
[<
ffffffff81804ac5>] ? _raw_spin_unlock_bh+0x35/0x40
[<
ffffffff816bcbe2>] ? release_sock+0x82/0xb0
[<
ffffffff816b93d4>] __sys_sendmsg+0x54/0x90
[<
ffffffff816b9422>] SyS_sendmsg+0x12/0x20
[<
ffffffff81805172>] entry_SYSCALL_64_fastpath+0x1a/0xa9
So when nobody use the related match/target module, there's no need to
"cache" it. And nft_[match|target]_release are useless anymore, remove
them.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Sat, 23 Jul 2016 08:00:31 +0000 (16:00 +0800)]
netfilter: nft_compat: put back match/target module if init fail
If the user specify the invalid NFTA_MATCH_INFO/NFTA_TARGET_INFO attr
or memory alloc fail, we should call module_put to the related match
or target. Otherwise, we cannot remove the module even nobody use it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gao Feng [Fri, 22 Jul 2016 04:59:15 +0000 (12:59 +0800)]
netfilter: h323: Use mod_timer instead of set_expect_timeout
Simplify the code without any side effect. The set_expect_timeout is
used to modify the timer expired time. It tries to delete timer, and
add it again. So we could use mod_timer directly.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Sat, 23 Jul 2016 06:44:31 +0000 (15:44 +0900)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix memory leak in nftables, from Liping Zhang.
2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
risk NULL skb derefs, from Sven Eckelmann.
3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
Greenman.
4) Handle properly when we have ppp_unregister_channel() happening in
parallel with ppp_connect_channel(), from WANG Cong.
5) Fix DCCP deadlock, from Eric Dumazet.
6) Bail out properly in UDP if sk_filter() truncates the packet to be
smaller than even the space that the protocol headers need. From
Michal Kubecek.
7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.
8) Make TCP challenge ACKs less predictable, from Eric Dumazet.
9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
packet: propagate sock_cmsg_send() error
net/mlx5e: Fix del vxlan port command buffer memset
packet: fix second argument of sock_tx_timestamp()
net: switchdev: change ageing_time type to clock_t
Update maintainer for EHEA driver.
net/mlx4_en: Add resilience in low memory systems
net/mlx4_en: Move filters cleanup to a proper location
sctp: load transport header after sk_filter
net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
net: nb8800: Fix SKB leak in nb8800_receive()
et131x: Fix logical vs bitwise check in et131x_tx_timeout()
vlan: use a valid default mtu value for vlan over macsec
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
mlxsw: spectrum: Prevent invalid ingress buffer mapping
mlxsw: spectrum: Prevent overwrite of DCB capability fields
mlxsw: spectrum: Don't emit errors when PFC is disabled
mlxsw: spectrum: Indicate support for autonegotiation
mlxsw: spectrum: Force link training according to admin state
r8152: add MODULE_VERSION
...
Linus Torvalds [Sat, 23 Jul 2016 05:25:02 +0000 (14:25 +0900)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"This contains a fix for a potential crash/corruption issue and another
where the suid/sgid bits weren't cleared on write"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: verify upper dentry in ovl_remove_and_whiteout()
ovl: Copy up underlying inode's ->i_mode to overlay inode
ovl: handle ATTR_KILL*
Linus Torvalds [Sat, 23 Jul 2016 03:54:20 +0000 (12:54 +0900)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Five fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
pps: do not crash when failed to register
tools/vm/slabinfo: fix an unintentional printf
testing/radix-tree: fix a macro expansion bug
radix-tree: fix radix_tree_iter_retry() for tagged iterators.
mm: memcontrol: fix cgroup creation failure after many small jobs
Linus Torvalds [Sat, 23 Jul 2016 03:51:52 +0000 (12:51 +0900)]
Merge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux
Pull intel kabylake drm fixes from Dave Airlie:
"As mentioned Intel has gathered all the Kabylake fixes from -next,
which we've enabled in 4.7 for the first time, these are pretty much
limited in scope to only affects kabylake, which is hw that isn't
shipping yet. So I'm mostly okay with it going in now.
If we don't land this, it might be a good idea to disable kabylake
support in 4.7 before we ship"
* tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux: (28 commits)
drm/i915/kbl: Introduce the first official DMC for Kabylake.
drm/i915: Introduce Kabypoint PCH for Kabylake H/DT.
drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate
drm/i915/gen9: Add WaFbcHighMemBwCorruptionAvoidance
drm/i195/fbc: Add WaFbcNukeOnHostModify
drm/i915/gen9: Add WaFbcWakeMemOn
drm/i915/gen9: Add WaFbcTurnOffFbcWatermark
drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch
drm/i915/gen9: Add WaEnableChickenDCPR
drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing
drm/i915/kbl: Add WaDisableGafsUnitClkGating
drm/i915/kbl: Add WaForGAMHang
drm/i915: Add WaInsertDummyPushConstP for bxt and kbl
drm/i915/kbl: Add WaDisableDynamicCreditSharing
drm/i915/kbl: Add WaDisableGamClockGating
drm/i915/gen9: Enable must set chicken bits in config0 reg
drm/i915/kbl: Add WaDisableLSQCROPERFforOCL
drm/i915/kbl: Add WaDisableSDEUnitClockGating
drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0
drm/i915/kbl: Add WaEnableGapsTsvCreditFix
...
Linus Torvalds [Sat, 23 Jul 2016 03:46:42 +0000 (12:46 +0900)]
Merge tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Two i915 regression fixes.
Intel have submitted some Kabylake fixes I'll send separately, since
this is the first kernel with kabylake support and they don't go much
outside that area I think they should be fine"
* tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux:
drm/i915: add missing condition for committing planes on crtc
drm/i915: Treat eDP as always connected, again
Linus Torvalds [Sat, 23 Jul 2016 03:39:08 +0000 (12:39 +0900)]
Merge tag 'm68k-for-v4.8-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k upddates from Geert Uytterhoeven:
- assorted spelling fixes
- defconfig updates
* tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.7-rc2
m68k: Assorted spelling fixes
Linus Torvalds [Sat, 23 Jul 2016 03:32:50 +0000 (12:32 +0900)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A handful of fixes before final release:
Marvell Armada:
- One to fix a typo in the devicetree specifying memory ranges for
the crypto engine
- Two to deal with marking PCI and device-memory as strongly ordered
to avoid hardware deadlocks, in particular when enabling above
crypto driver.
- Compile fix for PM
Allwinner:
- DT clock fixes to deal with u-boot-enabled framebuffer (simplefb).
- Make R8 (C.H.I.P. SoC) inherit system compatibility from A13 to
make clocks register proper.
Tegra:
- Fix SD card voltage setting on the Tegra3 Beaver dev board
Misc:
- Two maintainers updates for STM32 and STi platforms"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: tegra: beaver: Allow SD card voltage to be changed
MAINTAINERS: update STi maintainer list
MAINTAINERS: update STM32 maintainers list
ARM: mvebu: compile pm code conditionally
ARM: dts: sun7i: Fix pll3x2 and pll7x2 not having a parent clock
ARM: dts: sunxi: Add pll3 to simplefb nodes clocks lists
ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
ARM: mvebu: map PCI I/O regions strongly ordered
ARM: mvebu: fix HW I/O coherency related deadlocks
ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13
Linus Torvalds [Sat, 23 Jul 2016 03:20:55 +0000 (12:20 +0900)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a sporadic build failure in the qat driver as well as a
memory corruption bug in rsa-pkcs1pad"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
crypto: qat - make qat_asym_algs.o depend on asn1 headers
Linus Torvalds [Sat, 23 Jul 2016 03:15:48 +0000 (12:15 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull key handling fixes from James Morris:
"Quoting David Howells:
Here are three miscellaneous fixes:
(1) Fix a panic in some debugging code in PKCS#7. This can only
happen by explicitly inserting a #define DEBUG into the code.
(2) Fix the calculation of the digest length in the PE file parser.
This causes a failure where there should be a success.
(3) Fix the case where an X.509 cert can be added as an asymmetric key
to a trusted keyring with no trust restriction if no AKID is
supplied.
Bugs (1) and (2) aren't particularly problematic, but (3) allows a
security check to be bypassed. Happily, this is a recent regression
and never made it into a released kernel"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Fix for erroneous trust of incorrectly signed X.509 certs
pefile: Fix the failure of calculation for digest
PKCS#7: Fix panic when referring to the empty AKID when DEBUG defined
Linus Torvalds [Sat, 23 Jul 2016 03:10:48 +0000 (12:10 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A few more fixes for the input subsystem:
- restore naming for tsc2005 touchscreens as some userspace match on it
- fix out of bound access in legacy keyboard driver
- fixup in RMI4 driver
Everything is tagged for stable as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: tsc200x - report proper input_dev name
tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
Input: synaptics-rmi4 - fix maximum size check for F12 control register 8
Linus Torvalds [Sat, 23 Jul 2016 03:07:37 +0000 (12:07 +0900)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dan Williams:
"This contains a regression fix for a problem that was introduced in
v4.7-rc6.
In 4.7-rc1 we introduced auto-probing for the ACPI DSM (device-
specific-method) format that the platform firmware implements for
nvdimm devices. We initially fixed a regression in probing the QEMU
DSM implementation by making acpi_check_dsm() tolerant of the way QEMU
reports the "0 DSMs supported" condition.
However, that broke HPE platforms since that tolerance caused the
driver to mistakenly match the 1-zero-byte response those platforms
give to "unknown" commands. Instead, we simply make the driver
tolerant of not finding any supported DSMs. This has been tested to
work with both QEMU and HPE platforms.
This commit has appeared in a -next release with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: make DIMM DSMs optional
Linus Torvalds [Sat, 23 Jul 2016 03:03:21 +0000 (12:03 +0900)]
Merge tag 'gpio-v4.7-6' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Compile problem fix for Tegra,
Sorry to send this in the last minute but Ingo says this build failure
is very prominent so I'm not going to wait for v4.7 before sending it.
It is a case of COMPILE_TEST causing more problems than it solves and
I'm already swearing about me shooting myself in the foot with that
gun :("
* tag 'gpio-v4.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: tegra: don't auto-enable for COMPILE_TEST
Linus Torvalds [Sat, 23 Jul 2016 02:55:20 +0000 (11:55 +0900)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Michael Turquette:
"Fix a bug in the at91 clk driver, two compile time warnings in sunxi
clk drivers, and one bug in a sunxi clk driver introduced in the 4.7
merge window"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: fix clk_programmable_set_parent()
clk: sunxi: remove unused variable
clk: sunxi: display: Add per-clock flags
clk: sunxi: tcon-ch1: Do not return a negative error in get_parent
Linus Torvalds [Sat, 23 Jul 2016 02:46:59 +0000 (11:46 +0900)]
Merge branch 'for-4.7-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fix from Tejun Heo:
"Another fallout from max_sectors bump a couple years ago. The lite-on
optical drive times out on large requests"
* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: LITE-ON CX1-JB256-HP needs lower max_sectors
Linus Torvalds [Sat, 23 Jul 2016 02:43:17 +0000 (11:43 +0900)]
Merge tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a few late mmc fixes intended for v4.7 final.
MMC core:
- Fix eMMC packed command header endianness
- Fix free of uninitialized buffer for mmc ioctl
MMC host:
- pxamci: Fix potential oops in ->probe()"
* tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: pxamci: fix potential oops
mmc: block: fix packed command header endianness
mmc: block: fix free of uninitialized 'idata->buf'
Linus Torvalds [Sat, 23 Jul 2016 02:28:06 +0000 (11:28 +0900)]
Merge tag 'sound-4.7-fix2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"No surprise, just a few small fixes: a couple of changes are seen in
the core part, and both of them are rather for unusual error paths.
The rest are the regular HD-audio fixes and one USB-audio regression
fix"
* tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix quirks code is not called
ALSA: hda: add AMD Stoney PCI ID with proper driver caps
ALSA: hda - fix use-after-free after module unload
ALSA: pcm: Free chmap at PCM free callback, too
ALSA: ctl: Stop notification after disconnection
ALSA: hda/realtek - add new pin definition in alc225 pin quirk table
Linus Torvalds [Sat, 23 Jul 2016 02:22:37 +0000 (11:22 +0900)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull NVMe fix from Jens Axboe:
"Late addition here, it's basically a revert of a patch that was added
in this merge window, but has proven to cause problems.
This is swapping out the RCU based namespace protection with a good
old mutex instead"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: Remove RCU namespace protection
Jiri Slaby [Wed, 20 Jul 2016 22:45:08 +0000 (15:45 -0700)]
pps: do not crash when failed to register
With this command sequence:
modprobe plip
modprobe pps_parport
rmmod pps_parport
the partport_pps modules causes this crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: parport_detach+0x1d/0x60 [pps_parport]
Oops: 0000 [#1] SMP
...
Call Trace:
parport_unregister_driver+0x65/0xc0 [parport]
SyS_delete_module+0x187/0x210
The sequence that builds up to this is:
1) plip is loaded and takes the parport device for exclusive use:
plip0: Parallel port at 0x378, using IRQ 7.
2) pps_parport then fails to grab the device:
pps_parport: parallel port PPS client
parport0: cannot grant exclusive access for device pps_parport
pps_parport: couldn't register with parport0
3) rmmod of pps_parport is then killed because it tries to access
pardev->name, but pardev (taken from port->cad) is NULL.
So add a check for NULL in the test there too.
Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Wed, 20 Jul 2016 22:45:05 +0000 (15:45 -0700)]
tools/vm/slabinfo: fix an unintentional printf
The curly braces are missing here so we print stuff unintentionally.
Fixes:
9da4714a2d44 ('slub: slabinfo update for cmpxchg handling')
Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Wed, 20 Jul 2016 22:45:03 +0000 (15:45 -0700)]
testing/radix-tree: fix a macro expansion bug
There are no parentheses around this macro and it causes a problem when
we do:
index = rand() % THRASH_SIZE;
Link: http://lkml.kernel.org/r/20160715210953.GC19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Wed, 20 Jul 2016 22:45:00 +0000 (15:45 -0700)]
radix-tree: fix radix_tree_iter_retry() for tagged iterators.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
RIP: radix_tree_next_slot include/linux/radix-tree.h:473
find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
....
Call Trace:
pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
do_writepages+0x97/0x100 mm/page-writeback.c:2364
__filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
vfs_fsync_range+0x10a/0x250 fs/sync.c:195
vfs_fsync fs/sync.c:209
do_fsync+0x42/0x70 fs/sync.c:219
SYSC_fdatasync fs/sync.c:232
SyS_fdatasync+0x19/0x20 fs/sync.c:230
entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().
Fixes:
46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 20 Jul 2016 22:44:57 +0000 (15:44 -0700)]
mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes:
b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Westphal [Thu, 21 Jul 2016 10:51:17 +0000 (12:51 +0200)]
netfilter: connlabels: move set helper to xt_connlabel
xt_connlabel is the only user so move it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 21 Jul 2016 10:51:16 +0000 (12:51 +0200)]
netfilter: conntrack: support a fixed size of 128 distinct labels
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.
We track size of each label storage area in the 'words' member.
But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.
As most arches are 64bit we need to allocate 24 bytes in this case:
struct nf_conn_labels {
u8 words; /* 0 1 */
/* XXX 7 bytes hole, try to pack */
long unsigned bits[2]; /* 8 24 */
Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.
We still only allocate the extension if its needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arnd Bergmann [Wed, 6 Jul 2016 12:54:03 +0000 (14:54 +0200)]
gpio: tegra: don't auto-enable for COMPILE_TEST
I stumbled over a build error with COMPILE_TEST and CONFIG_OF
disabled:
drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe':
drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node'
The problem is that the newly added GPIO_TEGRA Kconfig symbol
does not have a dependency on CONFIG_OF. However, there is another
problem here as the driver gets enabled unconditionally whenever
COMPILE_TEST is set.
This fixes both problems, by making the symbol user-visible
when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y.
As a side-effect, it is now possible to compile-test a Tegra
kernel with GPIO support disabled, which is harmless.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
4dd4dd1d2120 ("gpio: tegra: Allow compile test")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Herbert Xu [Fri, 22 Jul 2016 09:58:21 +0000 (17:58 +0800)]
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
To allow for child request context the struct akcipher_request child_req
needs to be at the end of the structure.
Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Maxim Patlasov [Fri, 22 Jul 2016 01:24:26 +0000 (18:24 -0700)]
ovl: verify upper dentry in ovl_remove_and_whiteout()
The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.
To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.
Essentially, it is the same problem and similar solution as in
commit
11f3710417d0 ("ovl: verify upper dentry before unlink and rename").
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Soheil Hassas Yeganeh [Wed, 20 Jul 2016 22:01:18 +0000 (18:01 -0400)]
packet: propagate sock_cmsg_send() error
sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.
Fixes:
c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 22 Jul 2016 04:50:49 +0000 (00:50 -0400)]
Merge branch 'macsec-gro'
Paolo Abeni says:
====================
macsec: enable s/w offloads
This patches leverage gro_cells infrastructure to enable both GRO and RPS
on macsec devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 20 Jul 2016 16:11:32 +0000 (18:11 +0200)]
macsec: enable GRO and RPS on macsec devices
Use gro_gells to trigger GRO and allow RPS on macsec traffic
after decryption.
Also, be sure to avoid clearing software offload features in
macsec_fix_features().
Overall this increase TCP tput by 30% on recent h/w.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 20 Jul 2016 16:11:31 +0000 (18:11 +0200)]
gro_cells: gro_cells_receive now return error code
so that the caller can update stats accordingly, if needed
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Jul 2016 06:39:36 +0000 (23:39 -0700)]
Merge tag 'nfc-next-4.8-1' of git://git./linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.8 pull request
This is the first NFC pull request for 4.8. We have:
- A fairly large NFC digital stack patchset:
* RTOX fixes.
* Proper DEP RWT support.
* ACK and NACK PDUs handling fixes, in both initiator
and target modes.
* A few memory leak fixes.
- A conversion of the nfcsim driver to use the digital stack.
The driver supports the DEP protocol in both NFC-A and NFC-F.
- Error injection through debugfs for the nfcsim driver.
- Improvements to the port100 driver for the Sony USB chipset, in
particular to the command abort and cancellation code paths.
- A few minor fixes for the pn533, trf7970a and fdp drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Wed, 20 Jul 2016 22:48:43 +0000 (15:48 -0700)]
samples: Add an IPv6 '-6' option to the pktgen scripts
Add a '-6' option to the sample pktgen scripts for sending out
IPv6 packets.
[root@kerneldev010.prn1 ~/pktgen]# ./pktgen_sample03_burst_single_flow.sh -i eth0 -s 64 -d fe80::f652:14ff:fec2:a14c -m f4:52:14:c2:a1:4c -b 32 -6
[root@kerneldev011.prn1 ~]# tcpdump -i eth0 -nn -c3 port 9
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:38:51.815297 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16
14:38:51.815311 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16
14:38:51.815313 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Jul 2016 05:07:24 +0000 (22:07 -0700)]
Merge branch 'xdp-cleanups'
Brenden Blanco says:
====================
misc cleanups for xdp
This addresses several of the non-blocking comments left over from the
xdp patch set. See individual patches for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Thu, 21 Jul 2016 00:22:35 +0000 (17:22 -0700)]
bpf: make xdp sample variable names more meaningful
The naming choice of index is not terribly descriptive, and dropcnt is
in fact incorrect for xdp2. Pick better names for these: ipproto and
rxcnt.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Thu, 21 Jul 2016 00:22:34 +0000 (17:22 -0700)]
rtnl: protect do_setlink from IFLA_XDP_ATTACHED
The IFLA_XDP_ATTACHED nested attribute is meant for read-only, and while
do_setlink properly ignores it, it should be more paranoid and reject
commands that try to set it.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Thu, 21 Jul 2016 00:22:33 +0000 (17:22 -0700)]
net/mlx4_en: use READ_ONCE when freeing xdp_prog
For consistency, and in order to hint at the synchronous nature of the
xdp_prog field, use READ_ONCE in the destroy path of the ring. All
occurrences should now use either READ_ONCE or xchg.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Jul 2016 04:36:55 +0000 (21:36 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2016-07-20
This series contains updates to fm10k only.
Ngai-Mint provides a fix to clear PCIE_GMBX bits to ensure the proper
functioning of the mailbox global interrupt after a data path reset.
Jake provides most of the patches in the series, starting with a early
return from fm10k_down() if we are already down to prevent conflict with
other threads. Fixed an issue where fm10k_update_stats() could cause
a null pointer dereference, specifically if it is called when we are going
down and the rings have been removed. Cleans up and fixes the data path
reset flow, Tx hang routine and stop_hw(). Re-worked the fm10k_reinit()
to be more maintainable and fixed several inconsistencies with the work
flow. Implemented fm10k_prepare_suspend() and fm10k_handle_resume()
which abstract around the now existing fm10k_prepare_for_reset and
fm10k_handle_reset. The new functions also handle stopping the service
task, which is something that the original re-init flow does not need.
Fixed an issue where if an FLR occurs, VF devices will be knocked out of
bus master mode, and the driver will be unable to recover from the reset
properly, so ensure bus master is enabled after every reset. Fixed an
issue where a reset will occur as if for no reason, regularly every few
minutes until the switch manager software is loaded, which is caused
by continuously requesting the lport map so only do the request after
we have verified the switch mailbox is tx_ready.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Stancek [Thu, 30 Jun 2016 10:23:51 +0000 (12:23 +0200)]
crypto: qat - make qat_asym_algs.o depend on asn1 headers
Parallel build can sporadically fail because asn1 headers may
not be built yet by the time qat_asym_algs.o is compiled:
drivers/crypto/qat/qat_common/qat_asym_algs.c:55:32: fatal error: qat_rsapubkey-asn1.h: No such file or directory
#include "qat_rsapubkey-asn1.h"
Cc: stable@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
David S. Miller [Thu, 21 Jul 2016 04:10:55 +0000 (21:10 -0700)]
Merge branch 'mv88r6xxx-eeprom-rework'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: rework EEPROM code
Some switches can access an optional external EEPROM via its registers.
The
88E6352 family of switches have 8-bit address / 16-bit data access.
The new
88E6390 family has 16-bit address / 8-bit data access.
This patchset cleans up the EEPROM code with 16-suffixed Global2 helpers
and makes it easy to add future support for 8-bit data EEPROM access.
It also removes unnecessary mutexes and a few locked access functions.
Changes in v2:
- add missing Signed-off-by tag
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 20 Jul 2016 22:18:36 +0000 (18:18 -0400)]
net: dsa: mv88e6xxx: kill last locked reg_read
Get rid of the last usage of the locked mv88e6xxx_reg_read function with
a new mv88e6xxx_port_read helper, useful later for chips with different
port registers base address.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 20 Jul 2016 22:18:35 +0000 (18:18 -0400)]
net: dsa: mv88e6xxx: rework EEPROM access
The 6352 family of switches and compatibles provide a 8-bit address and
16-bit data access to an optional EEPROM.
Newer chip such as the 6390 family slightly changed the access to 16-bit
address and 8-bit data.
This commit cleans up the EEPROM access code for 16-bit access and makes
it easy to eventually introduce future support for 8-bit access.
Here's a list of notable changes brought by this patch:
- provide Global2 unlocked helpers for EEPROM commands
- remove eeprom_mutex, only reg_lock is necessary for driver functions
- eeprom_len is 0 for chip without EEPROM, so return it directly
- the Running bit must be 0 before r/w, so wait for Busy *and* Running
- remove now unused mv88e6xxx_wait and mv88e6xxx_reg_write
- other than that, the logic (in _{get,set}_eeprom16) didn't change
Chips with an 8-bit EEPROM access will require to implement the
8-suffixed variant of G2 helpers and the related flag:
#define MV88E6XXX_FLAGS_EEPROM8 \
(MV88E6XXX_FLAG_G2_EEPROM_CMD | \
MV88E6XXX_FLAG_G2_EEPROM_ADDR)
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 20 Jul 2016 22:18:34 +0000 (18:18 -0400)]
net: dsa: mv88e6xxx: remove unused phy_mutex
Only reg_lock is necessary now and phy_mutex is dead. Remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 21 Jul 2016 01:42:54 +0000 (11:42 +1000)]
net/faraday: Disallow using reversed MAC address from hardware
The initial MAC address is retrieved from hardware if it's not
provided by device-tree. The reserved MAC address from hardware
will be used if non-reserved MAC address is invalid. It will
cause mismatched MAC address seen by hardware and software.
This disallows using the reserved hardware MAC address to avoid
the mismatched MAC address seen by hardware and software.
Fixes:
113ce107afe9 ("net/faraday: Read MAC address from chip")
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Welling [Wed, 20 Jul 2016 17:02:07 +0000 (10:02 -0700)]
Input: tsc200x - report proper input_dev name
Passes input_id struct to the common probe function for the tsc200x drivers
instead of just the bustype.
This allows for the use of the product variable to set the input_dev->name
variable according to the type of touchscreen used. Note that when we
introduced support for TSC2004 we started calling everything TSC200X, so
let's keep this quirk.
Signed-off-by: Michael Welling <mwelling@ieee.org>
Cc: stable@vger.kernel.org
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Dmitry Torokhov [Mon, 27 Jun 2016 21:12:34 +0000 (14:12 -0700)]
tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do
sym = U(key_maps[0][k]);
with large 'k'.
To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.
Also while at it let's switch to for_each_set_bit() instead of open-coding
it.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pablo Neira Ayuso [Tue, 19 Jul 2016 10:20:45 +0000 (12:20 +0200)]
netfilter: nf_tables: allow to filter out rules by table and chain
If the table and/or chain attributes are set in a rule dump request,
we filter out the rules based on this selection.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Mon, 18 Jul 2016 12:44:17 +0000 (20:44 +0800)]
netfilter: nft_log: fix snaplen does not truncate packets
There's a similar problem in xt_NFLOG, and was fixed by commit
7643507fe8b5
("netfilter: xt_NFLOG: nflog-range does not truncate packets"). Only set
copy_len here does not work, so we should enable NF_LOG_F_COPY_LEN also.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Mon, 18 Jul 2016 12:44:16 +0000 (20:44 +0800)]
netfilter: nft_log: check the validity of log level
User can specify the log level larger than 7(debug level) via
nfnetlink, this is invalid. So in this case, we should report
EINVAL to the userspace.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Mon, 18 Jul 2016 12:44:15 +0000 (20:44 +0800)]
netfilter: nft_log: fix possible memory leak if log expr init fail
Suppose that we specify the NFTA_LOG_PREFIX, then NFTA_LOG_LEVEL
and NFTA_LOG_GROUP are specified together or nf_logger_find_get
call returns fail, i.e. expr init fail, memory leak will happen.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gao Feng [Mon, 18 Jul 2016 03:39:23 +0000 (11:39 +0800)]
netfilter: Add helper array register/unregister functions
Add nf_ct_helper_init(), nf_conntrack_helpers_register() and
nf_conntrack_helpers_unregister() functions to avoid repetitive
opencoded initialization in helpers.
This patch keeps an id parameter for nf_ct_helper_init() not to break
helper matching by name that has been inconsistently exposed to
userspace through ports, eg. ftp-2121, and through an incremental id,
eg. tftp-1.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Saeed Mahameed [Wed, 20 Jul 2016 21:39:53 +0000 (00:39 +0300)]
net/mlx5e: Fix del vxlan port command buffer memset
memset the command buffers rather than the pointers to them.
Fixes:
b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Tue, 7 Jun 2016 23:09:02 +0000 (16:09 -0700)]
fm10k: bump version number
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:09:01 +0000 (16:09 -0700)]
fm10k: return proper error code when pci_enable_msix_range fails
The pci_enable_msix_range() function returns a positive value of the
number of allocated vectors if it succeeds. On failure it returns
a negative error code. Return this code properly so that the error
message printed by the driver will show the actual error code instead of
being masked by -ENOMEM.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:09:00 +0000 (16:09 -0700)]
fm10k: force link to remain down for at least a second on resume events
When we resume from an AER recovery with many active VFs, the PF sees
many spurious link up and link down events. Prevent this by delaying
link down for at least one second after the resume event.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:59 +0000 (16:08 -0700)]
fm10k: implement request_lport_map pointer
If the fm10k interface is brought up, but the switch manager software is
not running, the driver will continuously request the lport map every
few seconds in the base driver watchdog routine. Eventually after
several minutes the switch mailbox Tx fifo will fill up and the mailbox
will timeout, resulting in a reset. This reset will appear as if for no
reason, and occurs regularly every few minutes until the switch manager
software is loaded.
Prevent this from happening by only requesting the lport map after we've
verified the switch mailbox is tx_ready. In order to simplify code logic
and reduce code duplication, implement this as a new function pointer
"mac.ops.request_lport_map" which the VF will not implement. Otherwise,
we have to duplicate the tx_ready check outside of
fm10k_get_host_state_generic, or re-implement most of
fm10k_get_host_state_generic in the pf version.
The resulting code is simpler and easier to understand, and prevents the
PF from continuously requesting lport map and filling the Tx fifo of
a switch mailbox that isn't ready.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:58 +0000 (16:08 -0700)]
fm10k: check if PCIe link is restored
Sometimes, a VF driver will lose PCIe address access, such as due to
a PF FLR event. In fm10k_detach_subtask, poll and check whether the
PCIe register space is active again and restore the device when it has.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:57 +0000 (16:08 -0700)]
fm10k: enable bus master after every reset
If an FLR occurs, VF devices will be knocked out of bus master mode, and
the driver will be unable to recover from the reset properly, resulting
in malicious driver events and an infinite reset loop. In the normal
case, the bus master mode will already be enabled and this call will
essentially be a no-op. Since we're doing this every reset, it is
possible we could remove the other calls to pci_set_master() but it
seems not harmful to just leave them in place.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:56 +0000 (16:08 -0700)]
fm10k: use common flow for suspend and resume
Continuing the effort to commonize the similar suspend/resume flows,
finish up by using the new fm10k_handle_suspand and fm10k_handle_resume
functions for the standard suspend/resume flow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:55 +0000 (16:08 -0700)]
fm10k: implement reset_notify handler for PCIe FLR events
When a function level PCI reset is triggered using sysfs, it calls the
driver's .reset_notify error handler. Implement a handler based on the
now split fm10k_prepare_for_reset and fm10k_handle_reset functions, so
that we fully reset the driver when the PCI function level reset occurs.
This also ensures the reset is handled in a clean way by first disabling
all the driver bits first and then restoring them after the function
reset. Previously the stack simply performed a blind function reset and
our driver didn't take any part in the process.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:54 +0000 (16:08 -0700)]
fm10k: use common reset flow when handling io errors from PCI stack
Now that we have extracted the necessary steps for a split
suspend/resume flow, re-use these functions instead of using the current
open coded flow. This ensures that we don't miss any steps. It also
ensures that we have the correct driver states set.
Since we'll be handling all of the reset flow ourselves, we no longer
need to request a reset in the io_slot_reset() function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:53 +0000 (16:08 -0700)]
fm10k: implement prepare_suspend and handle_resume
Implement fm10k_prepare_suspend and fm10k_handle_resume functions which
abstract around the now existing fm10k_prepare_for_reset and
fm10k_handle_reset. The new functions also handle stopping the service
task, which is something that the original re-init flow does not need.
Every other location that does a suspend/resume type flow is expected to
use these functions, because otherwise they may have conflicts with the
running watchdog routines. This also has the effect of preventing
possible surprise remove events during handling of FLR events and PCIe
errors.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:52 +0000 (16:08 -0700)]
fm10k: split fm10k_reinit into two functions
There are several flows in the driver which perform the similar function
of tearing down software and restoring software to recover from certain
errors or PCIe events, including:
* fm10k_reinit
* fm10k_suspend/resume
* fm10k_io_error_detected/fm10k_io_resume
In addition, we want to implement a .reset_notify() handler as well
which will also perform similar function.
Rework how the driver codes reset and resume flows by separating out the
reinit logic into two functions "fm10k_prepare_for_reset" and
"fm10k_handle_reset". This first step will allow us to re-use this
functionality in the similar blocks of code instead of re-coding the
same sequence of events slightly different.
The end result should be more maintainable and correct, fixing several
inconsistencies with the work flow.
The new functions expect to take the rtnl_lock() themselves, and it does
have the unfortunate side effect of having the reinit flow take then
release then take the rtnl_lock. However, this minor downside is
out weighted by the benefits of code reduction and reducing needless
difference between these flows.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:51 +0000 (16:08 -0700)]
fm10k: wait for queues to drain if stop_hw() fails once
It turns out that sometimes during a reset the Tx queues will be
temporarily stuck longer than .stop_hw() expects. Work around this issue
by attempting to .stop_hw() first. If it tails, wait a number of
attempts until the Tx queues appear to be drained. After this, attempt
stop_hw() again. This ensures that we avoid waiting if we don't need to,
such as during the first initialization of a VF, and give the proper
amount of time necessary to recover from most situations. It is possible
that the hardware is actually stuck. For PFs, this is usually fixed by
a datapath reset. Unfortunately the VF cannot request a similar reset
for itself.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:50 +0000 (16:08 -0700)]
fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING
When stop_hw() routine fails with FM10K_ERR_REQUESTS_PENDING, this
indicates that the Tx or Rx queues did not shutdown within the time
limit. Print a more suitable message at the dev_info level instead of
dev_err.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:49 +0000 (16:08 -0700)]
fm10k: use actual hardware registers when checking for pending Tx
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:48 +0000 (16:08 -0700)]
fm10k: perform data path reset even when switch is not ready
A while ago, an additional check for the switch being ready was added to
reset_hw. A recent refactor accidentally made this check return an error
code on failure which caused fm10k_probe to fail when the switch wasn't
brought up first. The original reasoning for the check was to prevent
additional data path reset when the fabric wasn't ready yet. However,
there isn't a compelling reason to keep the check, as the data path
reset will restore hardware to a known good state. Remove the check and
perform the data path reset regardless of the switch manager state.
An alternative fix is to return FM10K_SUCCESS instead, and bypass the
actual data path reset. This should be fine as we will perform
a reset_hw once the switch is active. However, since data path reset
will reset many parts of the hardware it seems better to just perform
the reset regardless of switch state.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:47 +0000 (16:08 -0700)]
fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING
Don't report FM10K_ERR_REQUESTS_PENDING when we fail to disable queues
within the timeout. This can occur due to a hardware Tx hang, or when
the switch ethernet fabric is resetting while we are transmitting
traffic. It can sometimes take up to 500ms before the Tx DMA engine
gives up. Instead, just skip the DMA engine check and perform
a data-path reset anyways. Add a statistic counter to keep track of the
number of resets occurring while we have pending DMA on the rings.
In order to prevent having to re-assign err to 0, re-order the
last few items of the reset_hw_pf function so that we don't perform
"return err" at the end.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ngai-Mint Kwan [Tue, 7 Jun 2016 23:08:46 +0000 (16:08 -0700)]
fm10k: Reset mailbox global interrupts
When a data path reset is initiated, write control to the PCIE_GMBX is
yanked from the switch manager. The switch manager writes to this
register to clear mailbox global interrupt bits as part of its mailbox
interrupt handling routine. When the device recovers from the data path
reset and these bits are not cleared, it will prevent future mailbox
global interrupts from being triggered. Upon confirming that the device
has exited from a data path reset, clear these bits to ensure the proper
functioning of the mailbox global interrupt.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 7 Jun 2016 23:08:45 +0000 (16:08 -0700)]
fm10k: prevent multiple threads updating statistics
Also prevent updating stats while the interface is down. If we're
already updating stats, just return doing nothing. When we take the
device down, block stat updates until we come back up. This ensures that
we avoid tearing down rings when we're updating statistics, and prevents
updating statistics until we're up.
We can't re-use the __FM10K_DOWN for this because it wouldn't prevent
multiple threads from accessing statistics. Neither does it prevent the
case where we start updating stats and then start going down in another
thread.
The fm10k_get_stats64 is except from this, because it has a completely
different flow which does not suffer from the same issues as
fm10k_update_stats might.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 3 Jun 2016 22:42:12 +0000 (15:42 -0700)]
fm10k: avoid possible null pointer dereference in fm10k_update_stats
It's currently possible for fm10k_update_stats to be called during the
window when we go down and the rings are removed. This can result in
a null pointer dereference. In fm10k_get_stats64 we work around this by
using ACCESS_ONCE and a null pointer check inside the loop. Use this
same flow in the fm10k_update_stats to avoid the potential null pointer.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 3 Jun 2016 22:42:11 +0000 (15:42 -0700)]
fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set
Return early from fm10k_down() when we are already down, since that
means another thread is either already finished or has started going
down, so shouldn't conflict with them.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 20 Jul 2016 21:53:57 +0000 (14:53 -0700)]
Merge branch 'mlxsw-per-prio-tc-counters'
Jiri Pirko says:
====================
mlxsw: Add per-{Prio,TC} counters
Ido says:
Add per-priority and per-tc counters, which are very useful for debugging
purposes and fine-tuning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 19 Jul 2016 13:35:54 +0000 (15:35 +0200)]
mlxsw: spectrum: Expose per-tc counters via ethtool
Expose the transmit queue length of each traffic class and the amount of
unicast packets discarded due to insufficient room in the shared buffer.
The first counter allows us to debug user priority to traffic class
mapping, whereas the drop counter is useful when determining shared buffer
configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 19 Jul 2016 13:35:53 +0000 (15:35 +0200)]
mlxsw: spectrum: Expose per-priority counters via ethtool
Expose per-priority bytes / packets / PFC packets counters via ethtool.
These counters are very useful when debugging QoS functionality and
provide a better insight into the device's forwarding plane.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 12:37:53 +0000 (12:37 +0000)]
net: cpmac: fix error handling of cpmac_probe()
Add the missing free_netdev() before return from function
cpmac_probe() in the error handling case.
This patch revert commit
0465be8f4f1d ("net: cpmac: fix in
releasing resources"), which changed to only free_netdev
while register_netdev failed.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 11:35:46 +0000 (11:35 +0000)]
net/mlx5: Use PTR_ERR_OR_ZERO() to simplify the code
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR.
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 11:33:10 +0000 (11:33 +0000)]
net: ethernet: nb8800: fix error handling of nb8800_probe()
In ops->reset() error handling case, clk_disable_unprepare() is missed
before return from this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 11:25:16 +0000 (11:25 +0000)]
wan/fsl_ucc_hdlc: use module_platform_driver to simplify the code
module_platform_driver() makes the code simpler by eliminating
boilerplate code.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 11:25:03 +0000 (11:25 +0000)]
wan/fsl_ucc_hdlc: remove .owner field for driver
Remove .owner field if calls are used which set it automatically.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 19 Jul 2016 11:23:24 +0000 (11:23 +0000)]
net: axienet: Fix return value check in axienet_probe()
In case of error, the function of_parse_phandle() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
Fixes:
46aa27df8853 ('net: axienet: Use devm_* calls')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Jul 2016 21:42:28 +0000 (14:42 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-07-19
Here's likely the last bluetooth-next pull request for the 4.8 kernel:
- Fix for L2CAP setsockopt
- Fix for is_suspending flag handling in btmrvl driver
- Addition of Bluetooth HW & FW info fields to debugfs
- Fix to use int instead of char for callback status.
The last one (from Geert Uytterhoeven) is actually not purely a
Bluetooth (or 802.15.4) patch, but it was agreed with other maintainers
that we take it through the bluetooth-next tree.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 20 Jul 2016 18:17:47 +0000 (20:17 +0200)]
bpf, elf: add official ELF machine define for eBPF
Add the official BPF ELF e_machine value that was assigned recently [1,2]
and will be propagated to glibc, et al. LLVM is switching to it in 3.9
release.
[1] https://github.com/llvm-mirror/llvm/commit/
36b9c09330bfb5e771914cfe307588f30d5510d2
[2] http://lists.iovisor.org/pipermail/iovisor-dev/2016-June/000266.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Wed, 20 Jul 2016 14:55:52 +0000 (07:55 -0700)]
bpf: fix implicit declaration of bpf_prog_add
For the ifndef case of CONFIG_BPF_SYSCALL, an inline version of
bpf_prog_add needs to exist otherwise the build breaks on some configs.
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2544:10: error: implicit declaration of function 'bpf_prog_add'
prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
The function is introduced in
59d3656d5bf50 ("bpf: add bpf_prog_add api for bulk prog refcnt")
and first used in
47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program").
Fixes:
47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Tariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Jul 2016 04:46:34 +0000 (21:46 -0700)]
Merge branch 'xdp'
Brenden Blanco says:
====================
Add driver bpf hook for early packet drop and forwarding
This patch set introduces new infrastructure for programmatically
processing packets in the earliest stages of rx, as part of an effort
others are calling eXpress Data Path (XDP) [1]. Start this effort by
introducing a new bpf program type for early packet filtering, before
even an skb has been allocated.
Extend on this with the ability to modify packet data and send back out
on the same port.
Patch 1 adds an API for bulk bpf prog refcnt incrememnt.
Patch 2 introduces the new prog type and helpers for validating the bpf
program. A new userspace struct is defined containing only data and
data_end as fields, with others to follow in the future.
In patch 3, create a new ndo to pass the fd to supported drivers.
In patch 4, expose a new rtnl option to userspace.
In patch 5, enable support in mlx4 driver.
In patch 6, create a sample drop and count program. With single core,
achieved ~20 Mpps drop rate on a 40G ConnectX3-Pro. This includes
packet data access, bpf array lookup, and increment.
In patch 7, add a page recycle facility to mlx4 rx, enabled when xdp is
active.
In patch 8, add the XDP_TX type to bpf.h
In patch 9, add helper in tx patch for writing tx_desc
In patch 10, add support in mlx4 for packet data write and forwarding
In patch 11, turn on packet write support in the bpf verifier
In patch 12, add a sample program for packet write and forwarding. With
single core, achieved ~10 Mpps rewrite and forwarding.
[1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf
v10:
1/12: Add bulk refcnt api.
5/12: Move prog from priv to ring. This attribute is still only set
globally, but the path to finer granularity should be clear. No lock
is taken, so some rings may operate on older programs for a time (one
napi loop). Looked into options such as napi_synchronize, but they
were deemed too slow (calls to msleep).
Rename prog to xdp_prog. Add xdp_ring_num to help with accounting,
used more heavily in later patches.
7/12: Adjust to use per-ring xdp prog. Use priv->xdp_ring_num where
before priv->prog was used to determine buffer allocations.
9/12: Add cpu_to_be16 to vlan_tag in mxl4_en_xmit(). Remove unused variable
from mlx4_en_xmit and unused params from build_inline_wqe.
v9:
4/11: Add missing newline in en_err message.
6/11: Move page_cache cleanup from mlx4_en_destroy_rx_ring to
mlx4_en_deactivate_rx_ring. Move mlx4_en_moderation_update back to
static. Remove calls to mlx4_en_alloc/free_resources in mlx4_xdp_set.
Adopt instead the approach of mlx4_en_change_mtu to use a watchdog.
9/11: Use a per-ring function pointer in tx to separate out the code
for regular and recycle paths of tx completion handling. Add a helper
function to init the recycle ring and callback, called just after
activating tx. Remove extra tx ring resource requirement, and instead
steal from the upper rings. This helps to avoid needing
mlx4_en_alloc_resources. Add some hopefully meaningful error
messages for the various error cases. Reverted some of the
hard-to-follow logic that was accounting for the extra tx rings.
v8:
1/11: Reduce WARN_ONCE to single line. Also, change act param of that
function to u32 to match return type of bpf_prog_run_xdp.
2/11: Clarify locking semantics in ndo comment.
4/11: Add en_err warning in mlx4_xdp_set on num_frags/mtu violation.
v7:
Addressing two of the major discussion points: return codes and ndo.
The rest will be taken as todo items for separate patches.
Add an XDP_ABORTED type, which explicitly falls through to DROP. The
same result must be taken for the default case as well, as it is now
well-defined API behavior.
Merge ndo_xdp_* into a single ndo. The style is similar to
ndo_setup_tc, but with less unidirectional naming convention. The IFLA
parameter names are unchanged.
TODOs:
Add ethtool per-ring stats for aborted, default cases, maybe even drop
and tx as well.
Avoid duplicate dma sync operation in XDP_PASS case as mentioned by
Saeed.
1/12: Add XDP_ABORTED enum, reword API comment, and update commit
message.
2/12: Rewrite ndo_xdp_*() into single ndo_xdp() with type/union style
calling convention.
3/12: Switch to ndo_xdp callback.
4/12: Add XDP_ABORTED case as a fall-through to XDP_DROP. Implement
ndo_xdp.
12/12: Dropped, this will need some more work.
v6:
2/12: drop unnecessary netif_device_present check
4/12, 6/12, 9/12: Reorder default case statement above drop case to
remove some copy/paste.
v5:
0/12: Rebase and remove previous 1/13 patch
1/12: Fix nits from Daniel. Left the (void *) cast as-is, to be fixed
in future. Add bpf_warn_invalid_xdp_action() helper, to be used when
out of bounds action is returned by the program. Add a comment to
bpf.h denoting the undefined nature of out of bounds returns.
2/12: Switch to using bpf_prog_get_type(). Rename ndo_xdp_get() to
ndo_xdp_attached().
3/12: Add IFLA_XDP as a nested type, and add the associated nla_policy
for the new subtypes IFLA_XDP_FD and IFLA_XDP_ATTACHED.
4/12: Fixup the use of READ_ONCE in the ndos. Add a user of
bpf_warn_invalid_xdp_action helper.
5/12: Adjust to using the nested netlink options.
6/12: kbuild was complaining about overflow of u16 on tile
architecture...bump frag_stride to u32. The page_offset member that
is computed from this was already u32.
v4:
2/12: Add inline helper for calling xdp bpf prog under rcu
3/12: Add detail to ndo comments
5/12: Remove mlx4_call_xdp and use inline helper instead.
6/12: Fix checkpatch complaints
9/12: Introduce new patch 9/12 with common helper for tx_desc write
Refactor to use common tx_desc write helper
11/12: Fix checkpatch complaints
v3:
Rewrite from v2 trying to incorporate feedback from multiple sources.
Specifically, add ability to forward packets out the same port and
allow packet modification.
For packet forwarding, the driver reserves a dedicated set of tx rings
for exclusive use by xdp. Upon completion, the pages on this ring are
recycled directly back to a small per-rx-ring page cache without
being dma unmapped.
Use of the percpu skb is dropped in favor of a lightweight struct
xdp_buff. The direct packet access feature is leveraged to remove
dependence on the skb.
The mlx4 driver implementation allocates a page-per-packet and maps it
in PCI_DMA_BIDIRECTIONAL mode when the bpf program is activated.
Naming is converted to use "xdp" instead of "phys_dev".
v2:
1/5: Drop xdp from types, instead consistently use bpf_phys_dev_
Introduce enum for return values from phys_dev hook
2/5: Move prog->type check to just before invoking ndo
Change ndo to take a bpf_prog * instead of fd
Add ndo_bpf_get rather than keeping a bool in the netdev struct
3/5: Use ndo_bpf_get to fetch bool
4/5: Enforce that only 1 frag is ever given to bpf prog by disallowing
mtu to increase beyond FRAG_SZ0 when bpf prog is running, or conversely
to set a bpf prog when priv->num_frags > 1
Rename pseudo_skb to bpf_phys_dev_md
Implement ndo_bpf_get
Add dma sync just before invoking prog
Check for explicit bpf return code rather than nonzero
Remove increment of rx_dropped
5/5: Use explicit bpf return code in example
Update commit log with higher pps numbers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:57 +0000 (12:16 -0700)]
bpf: add sample for xdp forwarding and rewrite
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.
Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:56 +0000 (12:16 -0700)]
bpf: enable direct packet data write for xdp progs
For forwarding to be effective, XDP programs should be allowed to
rewrite packet data.
This requires that the drivers supporting XDP must all map the packet
memory as TODEVICE or BIDIRECTIONAL before invoking the program.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:55 +0000 (12:16 -0700)]
net/mlx4_en: add xdp forwarding and data write support
A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.
For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.
When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.
Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.
In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:54 +0000 (12:16 -0700)]
net/mlx4_en: break out tx_desc write into separate function
In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:53 +0000 (12:16 -0700)]
bpf: add XDP_TX xdp_action for direct forwarding
XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:52 +0000 (12:16 -0700)]
net/mlx4_en: add page recycle to prepare rx ring for tx support
The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.
Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.
Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.
Note that setting an xdp program will now require the rings to be
reconfigured.
Before:
26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
3.21% swapper [kernel.vmlinux] [k] intel_idle
2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
After:
31.72% swapper [kernel.vmlinux] [k] intel_idle
8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
7.54% swapper [kernel.vmlinux] [k] poll_idle
6.36% swapper [mlx4_core] [k] mlx4_eq_int
4.21% swapper [kernel.vmlinux] [k] tasklet_action
4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state
3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc
2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret
1.37% swapper [kernel.vmlinux] [k] menu_select
1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:51 +0000 (12:16 -0700)]
Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.
Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.
$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17:
20403027 drops/s
./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK:
11791017(
c11788327+d2689) usec,
59622913 (60byte,0frags)
5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK:
11791012(
c11787906+d3106) usec,
60526944 (60byte,0frags)
5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK:
11791019(
c11788249+d2769) usec,
59868091 (60byte,0frags)
5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK:
11795039(
c11792403+d2636) usec,
59483181 (60byte,0frags)
5043067pps 2420Mb/sec (2420672160bps) errors: 0
perf report --no-children:
26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.90% swapper [kernel.vmlinux] [k] poll_idle
4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok
2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem
1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags
1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page
1.33% swapper [kernel.vmlinux] [k] intel_idle
1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5
0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d
0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee
0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9
0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5
0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23
0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4
0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action
machine specs:
receiver - Intel E5-1630 v3 @ 3.70GHz
sender - Intel E5645 @ 2.40GHz
Mellanox ConnectX-3 @40G
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:50 +0000 (12:16 -0700)]
net/mlx4_en: add support for fast rx drop bpf program
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.
In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.
Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.
The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 19 Jul 2016 19:16:49 +0000 (12:16 -0700)]
rtnl: add option for setting link xdp prog
Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>