Eric Dumazet [Fri, 25 Sep 2015 00:16:05 +0000 (17:16 -0700)]
tcp: avoid reorders for TFO passive connections
We found that a TCP Fast Open passive connection was vulnerable
to reorders, as the exchange might look like
[1] C -> S S <FO ...> <request>
[2] S -> C S. ack request <options>
[3] S -> C . <answer>
packets [2] and [3] can be generated at almost the same time.
If C receives the 3rd packet before the 2nd, it will drop it as
the socket is in SYN_SENT state and expects a SYNACK.
S will have to retransmit the answer.
Current OOO avoidance in linux is defeated because SYNACK
packets are attached to the LISTEN socket, while DATA packets
are attached to the children. They might be sent by different cpus,
and different TX queues might be selected.
It turns out that for TFO, we created a child, which is a
full blown socket in TCP_SYN_RECV state, and we simply can attach
the SYNACK packet to this socket.
This means that at the time tcp_sendmsg() pushes DATA packet,
skb->ooo_okay will be set iff the SYNACK packet had been sent
and TX completed.
This removes the reorder source at the host level.
We also removed the export of tcp_try_fastopen(), as it is no
longer called from IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Sep 2015 03:56:02 +0000 (20:56 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-09-28
This series contains updates to i40e, i40evf and igb to resolve issues
seen and reported by Red Hat.
Kiran moves i40e_get_head() in preparation for the refactor of the Tx
timeout logic, so that it can be used in other areas of the driver.
Refactored the driver timeout logic by issuing a writeback request via
a software interrupt to the hardware the first time the driver detects
a hang. This was due to the driver being too aggressive in resetting a
hung queue.
Shannon adds the GRE protocol to the transmit checksum encoding.
Anjali fixes an issue of forcing writeback too often, which caused us to
not benefit from NAPI. We now disable force writeback in the clean
routine for X710 and XL710 adapters. The X722 adapters do not enable
interrupt to force a writeback and benefit from WB_ON_ITR and so force
WB is left enabled for those adapters. Fixed a possible deadlock issue
where sync_vsi_filters() can be called directly under RTNL or through
the timer subtask without RTNL. So update the flow to see if we are
already under RTNL before trying to grab it.
Stefan Assmann provides a fix for igb where SR-IOV was not getting
enabled properly and we ran into a NULL pointer if the max_vfs module
parameter is specified. This is prevented by setting the
IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs().
v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch
to the series, as it resolves another issues seen and reported by
Red Hat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Assmann [Thu, 17 Sep 2015 12:46:10 +0000 (14:46 +0200)]
igb: assume MSI-X interrupts during initialization
In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()
This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).
[ 7.419348] BUG: unable to handle kernel NULL pointer dereference at
0000000000000048
[ 7.419367] IP: [<
ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
[ 7.419370] PGD 0
[ 7.419373] Oops: 0002 [#1] SMP
[ 7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[ 7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[ 7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
[...]
[ 7.419431] Call Trace:
[ 7.419442] [<
ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
[ 7.419447] [<
ffffffff814c7f15>] local_pci_probe+0x45/0xa0
Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai [Mon, 28 Sep 2015 20:37:12 +0000 (13:37 -0700)]
i40e: Fix for recursive RTNL lock during PROMISC change
The sync_vsi_filters function can be called directly under RTNL
or through the timer subtask without one. This was causing a deadlock.
If sync_vsi_filters is called from a thread which held the lock,
and in another thread the PROMISC setting got changed we would
be executing the PROMISC change in the thread which already held
the lock alongside the other filter update. The PROMISC change
requires a reset if we are on a VEB, which requires it to be called
under RTNL.
Earlier the driver would call reset for PROMISC change without
checking if we were already under RTNL and would try to grab it
causing a deadlock. This patch changes the flow to see if we are
already under RTNL before trying to grab it.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai [Sat, 26 Sep 2015 01:26:13 +0000 (18:26 -0700)]
i40e: Fix RS bit update in Tx path and disable force WB workaround
This patch fixes the issue of forcing WB too often causing us to not
benefit from NAPI.
Without this patch we were forcing WB/arming interrupt too often taking
away the benefits of NAPI and causing a performance impact.
With this patch we disable force WB in the clean routine for X710
and XL710 adapters. X722 adapters do not enable interrupt to force
a WB and benefit from WB_ON_ITR and hence force WB is left enabled
for those adapters.
For XL710 and X710 adapters if we have less than 4 packets pending
a software Interrupt triggered from service task will force a WB.
This patch also changes the conditions for setting RS bit as described
in code comments. This optimizes when the HW does a tail bump and when
it does a WB. It also optimizes when we do a wmb.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Fri, 25 Sep 2015 19:26:04 +0000 (19:26 +0000)]
i40e: add GRE tunnel type to csum encoding
Make sure the Tx checksum encoder knows about GRE protocol and sets the
descriptor flag appropriately.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kiran Patil [Thu, 24 Sep 2015 22:13:15 +0000 (18:13 -0400)]
i40e/i40evf: refactor tx timeout logic
This patch modifies the driver timeout logic by issuing a writeback
request via a software interrupt to the hardware the first time the
driver detects a hang. The driver was too aggressive in resetting a hung
queue, so back that off by removing logic to down the netdevice after
too many hangs, and move the function to the service task.
Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kiran Patil [Thu, 24 Sep 2015 19:43:02 +0000 (15:43 -0400)]
i40e: Move i40e_get_head into header file
i40e_get_head needs to be called in multiple files in a further patch,
prepare by moving the function into a header file.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ian Wilson [Thu, 24 Sep 2015 18:20:11 +0000 (11:20 -0700)]
bridge: Allow forward delay to be cfgd when STP enabled
Allow bridge forward delay to be configured when Spanning Tree is enabled.
Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 27 Sep 2015 05:40:56 +0000 (22:40 -0700)]
Merge branch 'vxlan-ipv4-ipv6'
Jiri Benc says:
====================
vxlan: support both IPv4 and IPv6 sockets
Note: this needs net merged into net-next in order to apply.
It's currently not easy enough to work with metadata based vxlan tunnels. In
particular, it's necessary to create separate network interfaces for IPv4
and IPv6 tunneling. Assigning an IPv6 address to an IPv4 interface is
allowed yet won't do what's expected. With route based tunneling, one has to
pay attention to use the vxlan interface opened with the correct family.
Other users of this (openvswitch) would need to always create two vxlan
interfaces.
Furthermore, there's no sane API for creating an IPv6 vxlan metadata based
interface.
This patchset simplifies this by opening both IPv4 and IPv6 socket if the
vxlan interface has the metadata flag (IFLA_VXLAN_COLLECT_METADATA) set.
Assignment of addresses etc. works as expected after this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 24 Sep 2015 11:50:02 +0000 (13:50 +0200)]
vxlan: support both IPv4 and IPv6 sockets in a single vxlan device
For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is
much more user friendly: it's not necessary to create two vxlan interfaces
and pay attention to using the right one in routing rules.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 24 Sep 2015 11:50:01 +0000 (13:50 +0200)]
vxlan: make vxlan_sock_add and vxlan_sock_release complementary
Make vxlan_sock_add both alloc the socket and attach it to vxlan_dev. Let
vxlan_sock_release accept vxlan_dev as its parameter instead of vxlan_sock.
This makes vxlan_sock_add and vxlan_sock release complementary. It reduces
code duplication in the next patch.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse [Thu, 24 Sep 2015 10:38:22 +0000 (11:38 +0100)]
8139cp: Fix GSO MSS handling
When fixing the TSO support I noticed we just mask ->gso_size with the
MSSMask value and don't care about the consequences.
Provide a .ndo_features_check() method which drops the NETIF_F_TSO
feature for any skb which would exceed the maximum, and thus forces it
to be segmented by software.
Then we can stop the masking in cp_start_xmit(), and just WARN if the
maximum is exceeded, which should now never happen.
Finally, Francois Romieu noticed that we didn't even have the right
value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse [Wed, 23 Sep 2015 08:46:09 +0000 (09:46 +0100)]
8139cp: Enable offload features by default
I fixed TSO. Hardware checksum and scatter/gather also appear to be
working correctly both on real hardware and in QEMU's emulation.
Let's enable them by default and see if anyone screams...
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 26 Sep 2015 23:08:27 +0000 (16:08 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
net/ipv4/arp.c
The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 26 Sep 2015 10:01:33 +0000 (06:01 -0400)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
instead of cloning them. From Daniel Borkmann.
2) When converting classical BPF into eBPF, fix the setting of the
source reg to BPF_REG_X. From Tycho Andersen.
3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
Linus Lussing.
4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
Roopa Prabhu.
6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
Florian Westphal.
8) Check DMA mapping errors in bna driver, from Ivan Vecera.
9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
misclearing of interrupt status in TX timeout handler, etc.) from
David Woodhouse.
10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
Hugne.
11) Fix autobind races et al. in netlink code, from Herbert Xu with
help from Tejun Heo and others.
12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
Eric Dumazet.
14) Fix array overruns in mac80211, from Johannes Berg and Dan
Carpenter.
15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
17) Fix byte order in geneve tunnel port config, from John W Linville.
18) Fix handling of ARP replies over lightweight tunnels, from Jiri
Benc.
19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
Kok and Roopa Prabhu.
20) Several reference count handling bug fixes in the PHY/MDIO layer
from Russel King.
21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
22) Fix crash in icmp_route_lookup(), from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: Fix panic in icmp_route_lookup
net: update docbook comment for __mdiobus_register()
ppp: fix lockdep splat in ppp_dev_uninit()
net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
phy: marvell: add link partner advertised modes
net: fix net_device refcounting
phy: add phy_device_remove()
phy: fixed-phy: properly validate phy in fixed_phy_update_state()
net: fix phy refcounting in a bunch of drivers
of_mdio: fix MDIO phy device refcounting
phy: add proper phy struct device refcounting
phy: fix mdiobus module safety
net: dsa: fix of_mdio_find_bus() device refcount leak
phy: fix of_mdio_find_bus() device refcount leak
ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
ip6_gre: Reduce log level in ip6gre_err() to debug
fib_rules: fix fib rule dumps across multiple skbs
bnx2x: byte swap rss_key to comply to Toeplitz specs
net: revert "net_sched: move tp->root allocation into fw_init()"
lwtunnel: remove source and destination UDP port config option
...
David Ahern [Thu, 24 Sep 2015 21:31:29 +0000 (15:31 -0600)]
net: Fix panic in icmp_route_lookup
Andrey reported a panic:
[ 7249.865507] BUG: unable to handle kernel pointer dereference at
000000b4
[ 7249.865559] IP: [<
c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt =
0000000030f7f001 *pde =
0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #
201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014 08/02/2006
[ 7249.866916] task:
c1a5ab00 ti:
c1a52000 task.ti:
c1a52000
[ 7249.866949] EIP: 0060:[<
c16afeca>] EFLAGS:
00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX:
00000000 EBX:
f483ba48 ECX:
00000000 EDX:
f2e18a00
[ 7249.867045] ESI:
000000c0 EDI:
f483ba70 EBP:
f483b9ec ESP:
f483b974
[ 7249.867077] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0:
8005003b CR2:
000000b4 CR3:
36ee07c0 CR4:
000006f0
[ 7249.867141] Stack:
[ 7249.867165]
320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268]
f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361]
8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494] [<
f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534] [<
f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576] [<
f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615] [<
c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648] [<
c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681] [<
f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714] [<
f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746] [<
f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780] [<
f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838] [<
f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889] [<
f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933] [<
c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970] [<
c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002] [<
c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034] [<
c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066] [<
c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097] [<
c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129] [<
c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163] [<
c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196] [<
c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227] [<
c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257] [<
c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289] [<
c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322] [<
c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353] [<
c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384] [<
c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415] [<
c167e280>] ip_rcv+0x2e0/0x420
...
Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.
Fixes:
613d09b30f8b ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <temnota.am@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 25 Sep 2015 10:56:56 +0000 (11:56 +0100)]
net: update docbook comment for __mdiobus_register()
Update the docbook comment for __mdiobus_register() to include the new
module owner argument. This resolves a warning found by the 0-day
builder.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 25 Sep 2015 23:20:55 +0000 (16:20 -0700)]
Merge branch 'for-4.3-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull another cgroup fix from Tejun Heo:
"The cgroup writeback support got inadvertently enabled for traditional
hierarchies revealing two regressions which are currently being worked
on. It shouldn't have been enabled on traditional hierarchies, so
disable it on them. This is enough to make the regressions go away
for people who aren't experimenting with cgroup"
* 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup, writeback: don't enable cgroup writeback on traditional hierarchies
David S. Miller [Fri, 25 Sep 2015 20:00:40 +0000 (13:00 -0700)]
Merge branch 'listener-sock-const'
Eric Dumazet says:
====================
dccp/tcp: constify listener sock
Another patch bomb to prepare lockless TCP/DCCP LISTEN handling.
SYNACK retransmits are built and sent without listener socket
being locked. Soon, initial SYNACK packets will have same property.
This series makes sure we did not something wrong with this model,
by adding a const qualifier in all the paths taken from synack building
and transmit, for IPv4/IPv6 and TCP/dccp.
The only potential problem was the rewrite of ecn bits for connections
with DCTCP as congestion module, but this was a very minor one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:24 +0000 (07:39 -0700)]
inet: constify inet_rtx_syn_ack() sock argument
SYNACK packets are sent on behalf on unlocked listeners
or fastopen sockets. Mark socket as const to catch future changes
that might break the assumption.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:23 +0000 (07:39 -0700)]
tcp/dccp: constify rtx_synack() and friends
This is done to make sure we do not change listener socket
while sending SYNACK packets while socket lock is not held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:22 +0000 (07:39 -0700)]
dccp: constify dccp_make_response() socket argument
Like tcp_make_synack() the only time we might change the socket is
when calling sock_wmalloc(), which is using atomic operation to
update sk->sk_wmem_alloc
Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:21 +0000 (07:39 -0700)]
tcp: constify tcp_v{4|6}_send_synack() socket argument
This documents fact that listener lock might not be held
at the time SYNACK are sent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:20 +0000 (07:39 -0700)]
ipv6: constify ip6_xmit() sock argument
This is to document that socket lock might not be held at this point.
skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
or spinlocks, so we promote the socket to non const when calling them.
netfilter hooks should never assume socket lock is held,
we also promote the socket to non const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:19 +0000 (07:39 -0700)]
tcp: constify tcp_make_synack() socket argument
listener socket is not locked when tcp_make_synack() is called.
We better make sure no field is written.
There is one exception : Since SYNACK packets are attached to the listener
at this moment (or SYN_RECV child in case of Fast Open),
sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using
atomic operations so this is safe.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:18 +0000 (07:39 -0700)]
tcp: remove tcp_ecn_make_synack() socket argument
SYNACK packets might be sent without holding socket lock.
For DCTCP/ECN sake, we should call INET_ECN_xmit() while
socket lock is owned, and only when we init/change congestion control.
This also fixies a bug if congestion module is changed from
dctcp to another one on a listener : we now clear ECN bits
properly.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:17 +0000 (07:39 -0700)]
tcp: remove tcp_synack_options() socket argument
We do not use the socket in this function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:16 +0000 (07:39 -0700)]
ip: constify ip_build_and_send_pkt() socket argument
This function is used to build and send SYNACK packets,
possibly on behalf of unlocked listener socket.
Make sure we did not miss a write by making this socket const.
We no longer can use ip_select_ident() and have to either
set iph->id to 0 or directly call __ip_select_ident()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:15 +0000 (07:39 -0700)]
tcp: md5: constify tcp_md5_do_lookup() socket argument
When TCP new listener is done, these functions will be called
without socket lock being held. Make sure they don't change
anything.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:14 +0000 (07:39 -0700)]
inet: constify ip_dont_fragment() arguments
ip_dont_fragment() can accept const socket and dst
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:13 +0000 (07:39 -0700)]
ipv6: constify inet6_csk_route_req() socket argument
socket is not modified, make it const so that callers can
do the same if they need.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:12 +0000 (07:39 -0700)]
ipv6: constify ip6_dst_lookup_{flow|tail}() sock arguments
ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch
socket, lets add a const qualifier.
This will permit the same change in inet6_csk_route_req()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:11 +0000 (07:39 -0700)]
inet: constify inet_csk_route_req() socket argument
This is used by TCP listener core, and listener socket shall
not be modified by inet_csk_route_req().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:10 +0000 (07:39 -0700)]
inet: constify ip_route_output_flow() socket argument
Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:09 +0000 (07:39 -0700)]
tcp: constify tcp_openreq_init_rwin()
Soon, listener socket wont be locked when tcp_openreq_init_rwin()
is called. We need to read socket fields once, as their value
could change under us.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 25 Sep 2015 14:39:08 +0000 (07:39 -0700)]
tcp: constify listener socket in tcp_v[46]_init_req()
Soon, listener socket spinlock will no longer be held,
add const arguments to tcp_v[46]_init_req() to make clear these
functions can not mess socket fields.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 24 Sep 2015 10:54:01 +0000 (12:54 +0200)]
ppp: fix lockdep splat in ppp_dev_uninit()
ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
ppp_create_interface() must then lock these mutexes in that same order
to avoid possible deadlock.
[ 120.880011] ======================================================
[ 120.880011] [ INFO: possible circular locking dependency detected ]
[ 120.880011] 4.2.0 #1 Not tainted
[ 120.880011] -------------------------------------------------------
[ 120.880011] ppp-apitest/15827 is trying to acquire lock:
[ 120.880011] (&pn->all_ppp_mutex){+.+.+.}, at: [<
ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[ 120.880011]
[ 120.880011] but task is already holding lock:
[ 120.880011] (rtnl_mutex){+.+.+.}, at: [<
ffffffff812e4255>] rtnl_lock+0x12/0x14
[ 120.880011]
[ 120.880011] which lock already depends on the new lock.
[ 120.880011]
[ 120.880011]
[ 120.880011] the existing dependency chain (in reverse order) is:
[ 120.880011]
[ 120.880011] -> #1 (rtnl_mutex){+.+.+.}:
[ 120.880011] [<
ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[ 120.880011] [<
ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[ 120.880011] [<
ffffffff812e4255>] rtnl_lock+0x12/0x14
[ 120.880011] [<
ffffffff812d9d94>] register_netdev+0x11/0x27
[ 120.880011] [<
ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
[ 120.880011] [<
ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
[ 120.880011] [<
ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
[ 120.880011] [<
ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 120.880011]
[ 120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
[ 120.880011] [<
ffffffff8107334e>] __lock_acquire+0xb07/0xe76
[ 120.880011] [<
ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[ 120.880011] [<
ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[ 120.880011] [<
ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[ 120.880011] [<
ffffffff812d5263>] rollback_registered_many+0x19e/0x252
[ 120.880011] [<
ffffffff812d5381>] rollback_registered+0x29/0x38
[ 120.880011] [<
ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
[ 120.880011] [<
ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
[ 120.880011] [<
ffffffff8112d9f6>] __fput+0xec/0x192
[ 120.880011] [<
ffffffff8112dacc>] ____fput+0x9/0xb
[ 120.880011] [<
ffffffff8105447a>] task_work_run+0x66/0x80
[ 120.880011] [<
ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
[ 120.880011] [<
ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
[ 120.880011] [<
ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
[ 120.880011]
[ 120.880011] other info that might help us debug this:
[ 120.880011]
[ 120.880011] Possible unsafe locking scenario:
[ 120.880011]
[ 120.880011] CPU0 CPU1
[ 120.880011] ---- ----
[ 120.880011] lock(rtnl_mutex);
[ 120.880011] lock(&pn->all_ppp_mutex);
[ 120.880011] lock(rtnl_mutex);
[ 120.880011] lock(&pn->all_ppp_mutex);
[ 120.880011]
[ 120.880011] *** DEADLOCK ***
Fixes:
8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudip Mukherjee [Thu, 24 Sep 2015 10:16:53 +0000 (15:46 +0530)]
net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
The builds of allmodconfig of avr32 is failing with:
drivers/net/ethernet/via/via-rhine.c:1098:2: error: implicit declaration
of function 'pci_iomap' [-Werror=implicit-function-declaration]
drivers/net/ethernet/via/via-rhine.c:1119:2: error: implicit declaration
of function 'pci_iounmap' [-Werror=implicit-function-declaration]
The generic empty pci_iomap and pci_iounmap is used only if CONFIG_PCI
is not defined and CONFIG_GENERIC_PCI_IOMAP is defined.
Add GENERIC_PCI_IOMAP in the dependency list for VIA_RHINE as we are
getting build failure when CONFIG_PCI and CONFIG_GENERIC_PCI_IOMAP both
are not defined.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Thu, 24 Sep 2015 08:59:05 +0000 (10:59 +0200)]
net: remove unused argument of __netdev_find_adj()
The __netdev_find_adj() helper does not use its first argument, only the
device to find and list to walk through.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 25 Sep 2015 19:27:23 +0000 (12:27 -0700)]
Merge branch 'l2tp-module-autoloading'
Stephen Hemminger says:
====================
l2tp: module autoloading
With L2TP it was necessary to manually load modules
which is a nuisance and not required with other tunneling
protocols. This set of patches adds the aliases and module
load hook to get rid of the necessity of modprobing.
====================
Acked-By: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Thu, 24 Sep 2015 04:33:36 +0000 (21:33 -0700)]
l2tp: remove references to modprobe in documentation
No longer need explicit modprobe's and update to use ip instead
of deprecated ifconfig command.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Thu, 24 Sep 2015 04:33:35 +0000 (21:33 -0700)]
l2tp: auto load IP modules
When creating a IP encapsulated tunnel the necessary l2tp module
should be loaded. It already works for UDP encapsulation, it just
doesn't work for direct IP encap.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Thu, 24 Sep 2015 04:33:34 +0000 (21:33 -0700)]
l2tp: auto load type modules
It should not be necessary to do explicit module loading when
configuring L2TP. Modules should be loaded as needed instead
(as is done already with netlink and other tunnel types).
This patch adds a new module alias type and code to load
the sub module on demand.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 24 Sep 2015 01:19:58 +0000 (18:19 -0700)]
net: dsa: Set a "dsa" device_type
Provide a device_type information for slave network devices created by
DSA, this is useful for user-space application to easily locate/search
for devices of a specific kind.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 23 Sep 2015 23:07:17 +0000 (00:07 +0100)]
phy: marvell: add link partner advertised modes
Read the standard link partner advertisment registers and store it in
phydev->lp_advertising, so ethtool can report this information to
userspace via ethtool. Zero it as per genphy if autonegotiation is
disabled. Tested with a Marvell
88E1512 PHY.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 25 Sep 2015 19:08:41 +0000 (12:08 -0700)]
Merge branch 'for-linus-4.3' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is an assorted set I've been queuing up:
Jeff Mahoney tracked down a tricky one where we ended up starting IO
on the wrong mapping for special files in btrfs_evict_inode. A few
people reported this one on the list.
Filipe found (and provided a test for) a difficult bug in reading
compressed extents, and Josef fixed up some quota record keeping with
snapshot deletion. Chandan killed off an accounting bug during DIO
that lead to WARN_ONs as we freed inodes"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: keep dropped roots in cache until transaction commit
Btrfs: Direct I/O: Fix space accounting
btrfs: skip waiting on ordered range for special files
Btrfs: fix read corruption of compressed and shared extents
Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock
Btrfs: don't initialize a space info as full to prevent ENOSPC
Linus Torvalds [Fri, 25 Sep 2015 18:33:52 +0000 (11:33 -0700)]
Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable patches:
- fix v4.2 SEEK on files over 2 gigs
- Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
- Fix recovery of recalled read delegations
Bugfixes:
- Fix a case where NFSv4 fails to send CLOSE after a server reboot
- Fix sunrpc to wait for connections to complete before retrying
- Fix sunrpc races between transport connect/disconnect and shutdown
- Fix an infinite loop when layoutget fail with BAD_STATEID
- nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
- Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
- Fix layoutreturn/close ordering issues"
* tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS41: make close wait for layoutreturn
NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set
NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget
NFSv4: Recovery of recalled read delegations is broken
NFS: Fix an infinite loop when layoutget fail with BAD_STATEID
NFS: Do cleanup before resetting pageio read/write to mds
SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose
SUNRPC: Lock the transport layer on shutdown
nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
SUNRPC: Ensure that we wait for connections to complete before retrying
SUNRPC: drop null test before destroy functions
nfs: fix v4.2 SEEK on files over 2 gigs
SUNRPC: Fix races between socket connection and destroy code
nfs: fix pg_test page count calculation
Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
Linus Torvalds [Fri, 25 Sep 2015 18:25:30 +0000 (11:25 -0700)]
Merge tag 'sound-4.3-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This ended up with a larger set of fixes than wished, unfortunately.
As diffstat shows, the majority of changes are for various ASoC
drivers (Realtek, Wolfson codec drivers, etc), in addition to a couple
of HD-audio regression fixes. All these are reasonably small and
nothing to scare much"
* tag 'sound-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: hda - Disable power_save_node for Thinkpads
ALSA: hda/tegra - async probe for avoiding module loading deadlock
ASoC: rt5645: Prevent the pop sound in case of playback and the jack is plugging
ASoC: rt5645: Increase the delay time to remove the pop sound
ASoC: rt5645: Use the type SOC_DAPM_SINGLE_AUTODISABLE to prevent the weird sound in runtime of power up
ASoC: pxa: pxa2xx-ac97: fix dma requestor lines
MAINTAINERS: Update website and git repo for Wolfson Microelectronics
ASoC: fsl_ssi: Fix checking of dai format for AC97 mode
ASoC: wm0010: fix error path
ASoC: wm0010: fix memory leak
ASoC: wm8960: correct the max register value of mic boost pga
ASoC: wm8962: remove 64k sample rate support
ASoC: davinci-mcasp: Fix devm_kasprintf format string
ASoC: fix broken pxa SoC support
ASoC: davinci-mcasp: Set .symmetric_rates = 1 in snd_soc_dai_driver
ASoC: au1x: psc-i2s: Fix unused variable 'ret' warning
ASoC: SPEAr: Make SND_SPEAR_SOC select SND_SOC_GENERIC_DMAENGINE_PCM
ASoC: mediatek: Increase periods_min in capture
ASoC: davinci-mcasp: Revise the FIFO threshold calculation
ASoC: wm8960: correct gain value for input PGA and add microphone PGA
...
Linus Torvalds [Fri, 25 Sep 2015 18:16:53 +0000 (11:16 -0700)]
Merge tag 'pci-v4.3-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for things we merged for v4.3 (VPD, MSI, and bridge
window management), and a new Renesas R8A7794 SoC device ID.
Details:
Resource management:
- Revert pci_read_bridge_bases() unification (Bjorn Helgaas)
- Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn
Helgaas)
MSI:
- Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson)
Renesas R-Car host bridge driver:
- Add R8A7794 support (Sergei Shtylyov)
Miscellaneous:
- Fix devfn for VPD access through function 0 (Alex Williamson)
- Use function 0 VPD only for identical functions (Alex Williamson)"
* tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: rcar: Add R8A7794 support
PCI: Use function 0 VPD for identical functions, regular VPD for others
PCI: Fix devfn for VPD access through function 0
PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses
PCI: Clear IORESOURCE_UNSET when clipping a bridge window
PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
Linus Torvalds [Fri, 25 Sep 2015 17:51:40 +0000 (10:51 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
bug fixes too"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: disable halt_poll_ns as default for s390x
KVM: x86: fix off-by-one in reserved bits check
KVM: x86: use correct page table format to check nested page table reserved bits
KVM: svm: do not call kvm_set_cr0 from init_vmcb
KVM: x86: trap AMD MSRs for the TSeg base and mask
KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
kvm: svm: reset mmu on VCPU reset
Linus Torvalds [Fri, 25 Sep 2015 17:11:26 +0000 (10:11 -0700)]
Merge tag 'powerpc-4.3-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Wire up sys_membarrier()
- cxl: Fix lockdep warning while creating afu_err_buff from Vaibhav
* tag 'powerpc-4.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
cxl: Fix lockdep warning while creating afu_err_buff attribute
powerpc: Wire up sys_membarrier()
David Hildenbrand [Fri, 18 Sep 2015 10:34:53 +0000 (12:34 +0200)]
KVM: disable halt_poll_ns as default for s390x
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.
Architectures are now free to set their own halt_poll_ns
default value.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 22 Sep 2015 08:15:59 +0000 (10:15 +0200)]
KVM: x86: fix off-by-one in reserved bits check
29ecd6601904 ("KVM: x86: avoid uninitialized variable warning",
2015-09-06) introduced a not-so-subtle problem, which probably
escaped review because it was not part of the patch context.
Before the patch, leaf was always equal to iterator.level. After,
it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set,
and when is_shadow_zero_bits_set does another "-1" the check on
reserved bits becomes incorrect. Using "iterator.level" in the call
fixes this call trace:
WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]()
Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq
[...]
Call Trace:
dump_stack+0x4e/0x84
warn_slowpath_common+0x95/0xe0
warn_slowpath_null+0x1a/0x20
handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]
tdp_page_fault+0x231/0x290 [kvm]
? emulator_pio_in_out+0x6e/0xf0 [kvm]
kvm_mmu_page_fault+0x36/0x240 [kvm]
? svm_set_cr0+0x95/0xc0 [kvm_amd]
pf_interception+0xde/0x1d0 [kvm_amd]
handle_exit+0x181/0xa70 [kvm_amd]
? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm]
? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
? preempt_count_sub+0x9b/0xf0
? mutex_lock_killable_nested+0x26f/0x490
? preempt_count_sub+0x9b/0xf0
kvm_vcpu_ioctl+0x358/0x710 [kvm]
? __fget+0x5/0x210
? __fget+0x101/0x210
do_vfs_ioctl+0x2f4/0x560
? __fget_light+0x29/0x90
SyS_ioctl+0x4c/0x90
entry_SYSCALL_64_fastpath+0x16/0x73
---[ end trace
37901c8686d84de6 ]---
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 22 Sep 2015 21:02:14 +0000 (23:02 +0200)]
KVM: x86: use correct page table format to check nested page table reserved bits
Intel CPUID on AMD host or vice versa is a weird case, but it can
happen. Handle it by checking the host CPU vendor instead of the
guest's in reset_tdp_shadow_zero_bits_mask. For speed, the
check uses the fact that Intel EPT has an X (executable) bit while
AMD NPT has NX.
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 21 Sep 2015 05:46:55 +0000 (07:46 +0200)]
KVM: svm: do not call kvm_set_cr0 from init_vmcb
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
memslots array (SRCU protected). Using a mini SRCU critical section
is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
the VMX vcpu_create callback calls synchronize_srcu.
Fixes this lockdep splat:
===============================
[ INFO: suspicious RCU usage. ]
4.3.0-rc1+ #1 Not tainted
-------------------------------
include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-i38/17000:
#0: (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]
[...]
Call Trace:
dump_stack+0x4e/0x84
lockdep_rcu_suspicious+0xfd/0x130
kvm_zap_gfn_range+0x188/0x1a0 [kvm]
kvm_set_cr0+0xde/0x1e0 [kvm]
init_vmcb+0x760/0xad0 [kvm_amd]
svm_create_vcpu+0x197/0x250 [kvm_amd]
kvm_arch_vcpu_create+0x47/0x70 [kvm]
kvm_vm_ioctl+0x302/0x7e0 [kvm]
? __lock_is_held+0x51/0x70
? __fget+0x101/0x210
do_vfs_ioctl+0x2f4/0x560
? __fget_light+0x29/0x90
SyS_ioctl+0x4c/0x90
entry_SYSCALL_64_fastpath+0x16/0x73
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David S. Miller [Fri, 25 Sep 2015 06:04:53 +0000 (23:04 -0700)]
Merge branch 'phy-mdio-refcnt'
Russell King says:
====================
Phy, mdiobus, and netdev struct device fixes
The third version of this series fixes the build error which David
identified, and drops the broken changes for the Cavium Thunger BGX
ethernet driver as this driver requires some complex changes to
resolve the leakage - and this is best done by people who can test
the driver.
Compared to v2, the only patch which has changed is patch 6
"net: fix phy refcounting in a bunch of drivers"
I _think_ I've been able to build-test all the drivers touched by
that patch to some degree now, though several of them needed the
Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
their dependencies.)
Previous cover letters below:
This is the second version of the series, with the comments David had
on the first patch fixed up. Original series description with updated
diffstat below.
While looking at the DSA code, I noticed we have a
of_find_net_device_by_node(), and it looks like users of that are
similarly buggy - it looks like net/dsa/dsa.c is the only user. Fix
that too.
Hi,
While looking at the phy code, I identified a number of weaknesses
where refcounting on device structures was being leaked, where
modules could be removed while in-use, and where the fixed-phy could
end up having unintended consequences caused by incorrect calls to
fixed_phy_update_state().
This patch series resolves those issues, some of which were discovered
with testing on an Armada 388 board. Not all patches are fully tested,
particularly the one which touches several network drivers.
When resolving the struct device refcounting problems, several different
solutions were considered before settling on the implementation here -
one of the considerations was to avoid touching many network drivers.
The solution here is:
phy_attach*() - takes a refcount
phy_detach*() - drops the phy_attach refcount
Provided drivers always attach and detach their phys, which they should
already be doing, this should change nothing, even if they leak a refcount.
of_phy_find_device() and of_* functions which use that take
a refcount. Arrange for this refcount to be dropped once
the phy is attached.
This is the reason why the previous change is important - we can't drop
this refcount taken by of_phy_find_device() until something else holds
a reference on the device. This resolves the leaked refcount caused by
using of_phy_connect() or of_phy_attach().
Even without the above changes, these drivers are leaking by calling
of_phy_find_device(). These drivers are addressed by adding the
appropriate release of that refcount.
The mdiobus code also suffered from the same kind of leak, but thankfully
this only happened in one place - the mdio-mux code.
I also found that the try_module_get() in the phy layer code was utterly
useless: phydev->dev.driver was guaranteed to always be NULL, so
try_module_get() was always being called with a NULL argument. I proved
this with my SFP code, which declares its own MDIO bus - the module use
count was never incremented irrespective of how I set the MDIO bus up.
This allowed the MDIO bus code to be removed from the kernel while there
were still PHYs attached to it.
One other bug was discovered: while using in-band-status with mvneta, it
was found that if a real phy is attached with in-band-status enabled,
and another ethernet interface is using the fixed-phy infrastructure, the
interface using the fixed-phy infrastructure is configured according to
the other interface using the in-band-status - which is caused by the
fixed-phy code not verifying that the phy_device passed in is actually
a fixed-phy device, rather than a real MDIO phy.
Lastly, having mdio_bus reversing phy_device_register() internals seems
like a layering violation - it's trivial to move that code to the phy
device layer.
====================
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:33 +0000 (20:36 +0100)]
net: fix net_device refcounting
of_find_net_device_by_node() uses class_find_device() internally to
lookup the corresponding network device. class_find_device() returns
a reference to the embedded struct device, with its refcount
incremented.
Add a comment to the definition in net/core/net-sysfs.c indicating the
need to drop this refcount, and fix the DSA code to drop this refcount
when the OF-generated platform data is cleaned up and freed. Also
arrange for the ref to be dropped when handling errors.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:28 +0000 (20:36 +0100)]
phy: add phy_device_remove()
Add a phy_device_remove() function to complement phy_device_register(),
which undoes the effects of phy_device_register() by removing the phy
device from visibility, but not freeing it.
This allows these details to be moved out of the mdio bus code into
the phy code where this action belongs.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:23 +0000 (20:36 +0100)]
phy: fixed-phy: properly validate phy in fixed_phy_update_state()
Validate that the phy_device passed into fixed_phy_update_state() is a
fixed-phy device before walking the list of phys for a fixed phy at the
same address.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:18 +0000 (20:36 +0100)]
net: fix phy refcounting in a bunch of drivers
of_phy_find_device() increments the phy struct device refcount, which
we need to properly balance. Add code to network drivers using this
function to ensure that the struct device refcount is correctly
balanced.
For xgene, looking back in the history, we should be able to use
of_phy_connect() with a zero flags argument for the DT case as this is
how the driver used to operate prior to
de7b5b3d790a ("net: eth: xgene:
change APM X-Gene SoC platform ethernet to support ACPI").
This leaves the Cavium Thunder BGX unfixed; fixing this driver is a
complicated task, one which the maintainers need to be involved with.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:13 +0000 (20:36 +0100)]
of_mdio: fix MDIO phy device refcounting
bus_find_device() is defined as:
* This is similar to the bus_for_each_dev() function above, but it
* returns a reference to a device that is 'found' for later use, as
* determined by the @match callback.
and it does indeed return a reference-counted pointer to the device:
while ((dev = next_device(&i)))
if (match(dev, data) && get_device(dev))
^^^^^^^^^^^^^^^
break;
klist_iter_exit(&i);
return dev;
What that means is that when we're done with the struct device, we must
drop that reference. Neither of_phy_connect() nor of_phy_attach() did
this when phy_connect_direct() or phy_attach_direct() failed.
With our previous patch, phy_connect_direct() and phy_attach_direct()
take a new refcount on the phy device when successful, so we can drop
our local reference immediatley after these functions, whether or not
they succeeded.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:08 +0000 (20:36 +0100)]
phy: add proper phy struct device refcounting
Take a refcount on the phy struct device when the phy device is attached
to a network device, and drop it after it's detached. This ensures that
a refcount is held on the phy device while the device is being used by
a network device, thereby preventing the phy_device from being
unexpectedly kfree()'d by phy_device_release().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:36:02 +0000 (20:36 +0100)]
phy: fix mdiobus module safety
Re-implement the mdiobus module refcounting to ensure that we actually
ensure that the mdiobus module code does not go away while we might call
into it.
The old scheme using bus->dev.driver was buggy, because bus->dev is a
class device which never has a struct device_driver associated with it,
and hence the associated code trying to obtain a refcount did nothing
useful.
Instead, take the approach that other subsystems do: pass the module
when calling mdiobus_register(), and record that in the mii_bus struct.
When we need to increment the module use count in the phy code, use
this stored pointer. When the phy is deteched, drop the module
refcount, remembering that the phy device might go away at that point.
This doesn't stop the mii_bus going away while there are in-use phys -
it merely stops the underlying code vanishing.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:35:57 +0000 (20:35 +0100)]
net: dsa: fix of_mdio_find_bus() device refcount leak
Current users of of_mdio_find_bus() leak a struct device refcount, as
they fail to clean up the reference obtained inside class_find_device().
Fix the DSA code to properly refcount the returned MDIO bus by:
1. taking a reference on the struct device whenever we assign it to
pd->chip[x].host_dev.
2. dropping the reference when we overwrite the existing reference.
3. dropping the reference when we free the data structure.
4. dropping the initial reference we obtained after setting up the
platform data structure, or on failure.
In step 2 above, where we obtain a new MDIO bus, there is no need to
take a reference on it as we would only have to drop it immediately
after assignment again, iow:
put_device(cd->host_dev); /* drop original assignment ref */
cd->host_dev = get_device(&mdio_bus_switch->dev); /* get our ref */
put_device(&mdio_bus_switch->dev); /* drop of_mdio_find_bus ref */
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 24 Sep 2015 19:35:52 +0000 (20:35 +0100)]
phy: fix of_mdio_find_bus() device refcount leak
of_mdio_find_bus() leaks a struct device refcount, caused by using
class_find_device() and not realising that the device reference has
its refcount incremented:
* Note, you will need to drop the reference with put_device() after use.
...
while ((dev = class_dev_iter_next(&iter))) {
if (match(dev, data)) {
get_device(dev);
break;
}
Update the comment, and arrange for the phy code to drop this refcount
when disposing of a reference to it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 25 Sep 2015 05:59:23 +0000 (22:59 -0700)]
Merge branch 'switchdev-transaction-item-queue'
Jiri Pirko says:
====================
switchdev: transaction item queue and cleanup
====================
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:49 +0000 (10:02 +0200)]
switchdev: reduce transaction phase enum down to a boolean
Now, since we have only 2 values for transaction phase, just use bool.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:48 +0000 (10:02 +0200)]
dsa: use prepare/commit switchdev transaction helpers
The enum is going to disappear, use the helpers instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:47 +0000 (10:02 +0200)]
switchdev: remove "ABORT" transaction phase
No longer used by drivers, as transaction queue with item destructors
takes care of abort phase internally in switchdev code. So kill it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:46 +0000 (10:02 +0200)]
switchdev: remove "NONE" transaction phase
Shouldn't have been there in the first place. Now it is unused, kill it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:45 +0000 (10:02 +0200)]
rocker: use switchdev transaction queue for allocated memory
Benefit from previously introduced transaction item queue infrastructure
and remove rocker specific transaction memory management.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:44 +0000 (10:02 +0200)]
rocker: push struct switchdev_trans down through rocker code
There will be needed to have switchdev_trans available down in the call
chain, so propagate it instead of trans phase enum. This enum will be
removed anyway. Also, use prepare/commit phase check helpers to get
information about current phase of transaction.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:43 +0000 (10:02 +0200)]
switchdev: add switchdev_trans_ph_prepare/commit helpers
Add helpers which should be used int attr_set/obj_add switchdev ops to
check the phase of transaction.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:42 +0000 (10:02 +0200)]
switchdev: move transaction phase enum under transaction structure
Before it disappears completely, move transaction phase enum under
transaction structure and make attr/obj structures a bit cleaner.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:41 +0000 (10:02 +0200)]
switchdev: introduce transaction item queue for attr_set and obj_add
Now, the memory allocation in prepare/commit state is done separatelly
in each driver (rocker). Introduce the similar mechanism in generic
switchdev code, in form of queue. That can be used not only for memory
allocations, but also for different items. Abort item destruction
is handled as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Sep 2015 08:02:40 +0000 (10:02 +0200)]
switchdev: rename "trans" to "trans_ph".
This is temporary, name "trans" will be used for something else and
"trans_ph" will eventually disappear.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 25 Sep 2015 03:14:26 +0000 (20:14 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
- Power allocator governor changes to allow binding on thermal zones
with missing power estimates information. From Javi Merino.
- Add compile test flags on thermal drivers that allow it without
producing compilation errors. From Eduardo Valentin.
- Fixes around memory allocation on cpu_cooling. From Javi Merino.
- Fix on db8500 cpufreq code to allow autoload. From Luis de
Bethencourt.
- Maintainer entries for cpu cooling device
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: power_allocator: exit early if there are no cooling devices
thermal: power_allocator: don't require tzp to be present for the thermal zone
thermal: power_allocator: relax the requirement of two passive trip points
thermal: power_allocator: relax the requirement of a sustainable_power in tzp
thermal: Add a function to get the minimum power
thermal: cpu_cooling: free power table on error or when unregistering
thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock
thermal: db8500_cpufreq_cooling: Fix module autoload for OF platform driver
thermal: cpu_cooling: Add MAINTAINERS entry
thermal: ti-soc: Kconfig fix to avoid menu showing wrongly
thermal: ti-soc: allow compile test
thermal: qcom_spmi: allow compile test
thermal: exynos: allow compile test
thermal: armada: allow compile test
thermal: dove: allow compile test
thermal: kirkwood: allow compile test
thermal: rockchip: allow compile test
thermal: spear: allow compile test
thermal: hisi: allow compile test
thermal: Fix thermal_zone_of_sensor_register to match documentation
Linus Torvalds [Fri, 25 Sep 2015 00:46:38 +0000 (17:46 -0700)]
Merge tag 'devicetree-fixes-for-4.3' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- Silence bogus warning for of_irq_parse_pci
- Fix typo in ARM idle-states binding doc and dts files
- Various minor binding documentation updates
* tag 'devicetree-fixes-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
Documentation: arm: Fix typo in the idle-states bindings examples
gpio: mention in DT binding doc that <name>-gpio is deprecated
of_pci_irq: Silence bogus "of_irq_parse_pci() failed ..." messages.
devicetree: bindings: Extend the bma180 bindings with bma250 info
of: thermal: Mark cooling-*-level properties optional
of: thermal: Fix inconsitency between cooling-*-state and cooling-*-level
Docs: dt: add #msi-cells to GICv3 ITS binding
of: add vendor prefix for Socionext Inc.
Matt Bennett [Thu, 24 Sep 2015 23:01:47 +0000 (11:01 +1200)]
ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
Currently error log messages in ip6_tnl_err are printed at 'warn'
level. This is different to other tunnel types which don't print
any messages. These log messages don't provide any information that
couldn't be deduced with networking tools. Also it can be annoying
to have one end of the tunnel go down and have the logs fill with
pointless messages such as "Path to destination invalid or inactive!".
This patch reduces the log level of these messages to 'dbg' level to
bring the visible behaviour into line with other tunnel types.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Pieralisi [Thu, 24 Sep 2015 14:53:56 +0000 (15:53 +0100)]
Documentation: arm: Fix typo in the idle-states bindings examples
The idle-states bindings mandate that the entry-method string
in the idle-states node must be "psci" for ARM v8 64-bit systems,
but the examples in the bindings report a wrong entry-method string.
Owing to this typo, some dts in the kernel wrongly defined the
entry-method property, since they likely cut and pasted the example
definition without paying attention to the bindings definitions.
This patch fixes the typo in the DT idle states bindings examples and
respective dts in the kernel so that the bindings and related dts
files are made compliant.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Howard Chen <howard.chen@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Javier Martinez Canillas [Mon, 21 Sep 2015 12:57:25 +0000 (14:57 +0200)]
gpio: mention in DT binding doc that <name>-gpio is deprecated
The gpiolib supports parsing DT properties of the form <name>-gpio but it
was only added for compatibility with older DT bindings that got it wrong
and should not be used in newer bindings.
The commit that added support for this was:
dd34c37aa3e8 ("gpio: of: Allow -gpio suffix for property names")
but didn't update the documentation to explain this so it's been a source
of confusion. So let's make this clear in the GPIO DT binding doc.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Rob Herring <robh@kernel.org>
David S. Miller [Thu, 24 Sep 2015 22:39:09 +0000 (15:39 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-09-23
This series contains updates to ixgbe only.
Mark provides all the changes in this series, first clears the destination
location for I2C data initially so that the received data will not be
corrupted by previous attempts. Then reduced the pauses/delays in the
PHY detection when no SFP is present by reducing the number of retires,
once an SFP is detected, the "normal" number of retries in PHY detection
will be used. Added support for X55EM_x SFP+ dual-speed, and fixed 1G and
10G link stability for X550EM_x by configuring the CS4227 correctly by
moving code to ixgbe_setup_mac_link_sfp_x550em(). Added functionality to
reset CS4227, since on some platforms the CS4227 does not initialize
properly. Next reduces the SFP polling rate, due to when an SFP is not
present, the I2C timeouts that result are very costly. So prevent the
SFP polling from being done more than once every two seconds. Added
support for I2C bus MUX. Fixed the setting of RDRXCTL register which
should fall through X540 and 82599, not 82598. In addition, added small
packet padding support in X550 by setting RDRXCTL.PSP when the driver is
in SRIOV mode. Fixed a known hardware issue where the PCI transactions
pending bit sticks high when there are pending transactions, so
workaround the issue by wait and then continue with our reset flow.
Added a new device ID for X550EM device with SFPs. Provided a fix with
the DCA setup, which was suggested by Alex Duyck <aduyck@mirantis.com>,
by making it so that we always set the relaxed ordering bits related to
the DCA registers even if DCA is not enbaled. Then moves the
configuration out of the ixgbe_down() and into ixgbe_configure() before
enabling the transmit and receive rings. This ensures that DCA is
configured correctly before starting the processing of packets.
Fixed VM-to-VM loopback mode which requires that FCRTH be set, but
the datasheets did not specify what the value should be. It has now
been determined that the correct value should be RXPBSIZE - (24*1024).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 24 Sep 2015 22:37:06 +0000 (15:37 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Most of the GPU drivers people were at XDC last week, so I didn't get
much to send, so I let it rollover until this week.
Also Alex was away for 3 weeks so amdgpu/radeon got a bit more stuff
than usual in one go.
I've been trying to figure out some 4.2 issues with i915 still (that
are fixed in 4.3, but bisecting ends up in a merge commit). Hopefully
next week I or i915 people can work that out"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (46 commits)
drm: Allow also control clients to check the drm version
drm/vmwgfx: Fix uninitialized return in vmw_kms_helper_dirty()
drm/vmwgfx: Fix uninitialized return in vmw_cotable_unbind()
drm/layerscape: fix handling fsl_dcu_drm_plane_index result
drm/mgag200: Fix driver_load error handling
drm/mgag200: Fix error handling paths in fbdev driver
drm/qxl: only report first monitor as connected if we have no state
drm/radeon: add quirk for MSI R7 370
drm/amdgpu: Sprinkle drm_modeset_lock_all to appease locking checks
drm/radeon: Sprinkle drm_modeset_lock_all to appease locking checks
drm/amdgpu: sync ce and me with SWITCH_BUFFER(2)
drm/amdgpu: integer overflow in amdgpu_mode_dumb_create()
drm/amdgpu: info leak in amdgpu_gem_metadata_ioctl()
drm/amdgpu: integer overflow in amdgpu_info_ioctl()
drm/amdgpu: unwind properly in amdgpu_cs_parser_init()
drm/amdgpu: Fix max_vblank_count value for current display engines
drm/amdgpu: use kmemdup rather than duplicating its implementation
drm/amdgpu: fix UVD suspend and resume for VI APU
drm/amdgpu: fix the UVD suspend sequence order
drm/amdgpu: make UVD handle checking more strict
...
David S. Miller [Thu, 24 Sep 2015 22:36:20 +0000 (15:36 -0700)]
Merge tag 'mac80211-for-davem-2015-09-22' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two small fixes:
* VHT MCS mask array overrun, reported by Dan Carpenter
* reset CQM history to always get a notification, from Sara Sharon
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Bennett [Wed, 23 Sep 2015 04:58:31 +0000 (16:58 +1200)]
ip6_gre: Reduce log level in ip6gre_err() to debug
Currently error log messages in ip6gre_err are printed at 'warn'
level. This is different to most other tunnel types which don't
print any messages. These log messages don't provide any information
that couldn't be deduced with networking tools. Also it can be annoying
to have one end of the tunnel go down and have the logs fill with
pointless messages such as "Path to destination invalid or inactive!".
This patch reduces the log level of these messages to 'dbg' level to
bring the visible behaviour into line with other tunnel types.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wilson Kok [Wed, 23 Sep 2015 04:40:22 +0000 (21:40 -0700)]
fib_rules: fix fib rule dumps across multiple skbs
dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.
This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 11 Sep 2015 23:06:09 +0000 (02:06 +0300)]
PCI: rcar: Add R8A7794 support
Add Renesas R8A7794 SoC support to the Renesas R-Car gen2 PCI driver.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Alex Williamson [Wed, 16 Sep 2015 04:24:46 +0000 (22:24 -0600)]
PCI: Use function 0 VPD for identical functions, regular VPD for others
932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
added PCI_DEV_FLAGS_VPD_REF_F0. Previously, we set the flag on every
non-zero function of quirked devices. If a function turned out to be
different from function 0, i.e., it had a different class, vendor ID, or
device ID, the flag remained set but we didn't make VPD accessible at all.
Flip this around so we only set PCI_DEV_FLAGS_VPD_REF_F0 for functions that
are identical to function 0, and allow regular VPD access for any other
functions.
[bhelgaas: changelog, stable tag]
Fixes:
932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
CC: stable@vger.kernel.org
Alex Williamson [Tue, 15 Sep 2015 17:17:21 +0000 (11:17 -0600)]
PCI: Fix devfn for VPD access through function 0
Commit
932c435caba8 ("PCI: Add dev_flags bit to access VPD through function
0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
Generally this works because we're fairly well guaranteed that a PCIe
device is at slot address 0, but for the general case, including
conventional PCI, it's incorrect. We need to get the slot and then convert
it back into a devfn.
Fixes:
932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
CC: stable@vger.kernel.org
Alex Williamson [Fri, 18 Sep 2015 21:08:54 +0000 (15:08 -0600)]
PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses
SR-IOV creates a virtual bus where bus->self is NULL. When we add VFs and
scan for an MSI domain, pci_set_bus_msi_domain() dereferences bus->self,
which causes a kernel NULL pointer dereference oops.
Scan up to the parent bus until we find a real bridge where we can get the
MSI domain.
[bhelgaas: changelog]
Fixes:
44aa0c657e3e ("PCI/MSI: Add hooks to populate the msi_domain field")
Tested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Eric Dumazet [Wed, 23 Sep 2015 03:44:17 +0000 (20:44 -0700)]
tcp: factorize sk_txhash init
Neal suggested to move sk_txhash init into tcp_create_openreq_child(),
called both from IPv4 and IPv6.
This opportunity was missed in commit
58d607d3e52f ("tcp: provide
skb->hash to synack packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 23 Sep 2015 00:04:58 +0000 (17:04 -0700)]
bnx2x: byte swap rss_key to comply to Toeplitz specs
After a good amount of debugging, I found bnx2x was byte swaping
the 40 bytes of rss_key.
If we byte swap the key, then bnx2x generates hashes matching
MSDN specs as documented in (Verifying the RSS Hash Calculation)
https://msdn.microsoft.com/en-us/library/windows/hardware/
ff571021%
28v=vs.85%29.aspx
It is mostly a non issue, unless we want to mix different NIC
in a host, and want consistent hashing among all of them, ie
if they all use the boot time generated rss key, or if some application
is choosing specific tuple(s) so that incoming traffic lands into known
rx queue(s).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 23 Sep 2015 00:01:11 +0000 (17:01 -0700)]
net: revert "net_sched: move tp->root allocation into fw_init()"
fw filter uses tp->root==NULL to check if it is the old method,
so it doesn't need allocation at all in this case. This patch
reverts the offending commit and adds some comments for old
method to make it obvious.
Fixes:
33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()")
Reported-by: Akshat Kakkar <akshat.1984@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 24 Sep 2015 21:31:40 +0000 (14:31 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2/dlm: fix deadlock when dispatch assert master
membarrier: clean up selftest
vmscan: fix sane_reclaim helper for legacy memcg
lib/iommu-common.c: do not try to deref a null iommu->lazy_flush() pointer when n < pool->hint
x86, efi, kasan: #undef memset/memcpy/memmove per arch
mm: migrate: hugetlb: putback destination hugepage to active list
mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified
userfaultfd: register uapi generic syscall (aarch64)
userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
userfaultfd: selftest: avoid my_bcmp false positives with powerpc
userfaultfd: selftest: only warn if __NR_userfaultfd is undefined
userfaultfd: selftest: headers fixup
userfaultfd: selftests: vm: pick up sanitized kernel headers
userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"
David S. Miller [Thu, 24 Sep 2015 21:31:37 +0000 (14:31 -0700)]
Merge branch 'lwt_arp'
Jiri Benc says:
====================
lwtunnel: make it really work, for IPv4
One of the selling points of lwtunnel was the ability to specify the tunnel
destination using routes. However, this doesn't really work currently, as
ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
have tunnel metadata attached, thus they will be sent out with the default
parameters or not sent at all, either way never reaching the requester.
Most of the egress tunnel parameters can be inferred from the ingress
metada. The only and important exception is UDP ports. This patchset infers
the egress data from the ingress data and disallow settings of UDP ports in
tunnel routes. If there's a need for different UDP ports, a new interface
needs to be created for each port combination. Note that it's still possible
to specify the UDP ports to use, it just needs to be done while creating the
vxlan/geneve interface.
This covers only ARPs. IPv6 ndisc has the same problem but is harder to
solve, as there's already dst attached to outgoing skbs. Ideas to solve this
are welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Tue, 22 Sep 2015 16:12:12 +0000 (18:12 +0200)]
lwtunnel: remove source and destination UDP port config option
The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.
As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.
Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).
If in the future we rework ARP handling to allow port specification, the
attributes can be added back.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Tue, 22 Sep 2015 16:12:11 +0000 (18:12 +0200)]
ipv4: send arp replies to the correct tunnel
When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.
We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.
The only thing to do is to "reverse" the ip_tunnel_info.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 23 Sep 2015 07:20:55 +0000 (09:20 +0200)]
net: axinet: Use of_property_read_u32 instead of open-coding it
Use of_property_read_u32 instead of of_get_property with return value
checks and endianness conversion.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 23 Sep 2015 07:20:02 +0000 (09:20 +0200)]
net: ll_temac: Use of_property_read_u32 instead of open-coding it
Use of_property_read_u32 to read the "clock-frequency" property instead
of using of_get_property with return value checks.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>