David Ertman [Fri, 14 Feb 2014 07:16:41 +0000 (07:16 +0000)]
e1000e: Refactor PM flows
Refactor the system power management flows to prevent the suspend path from
being executed twice when hibernating since both the freeze and
poweroff callbacks were set to e1000_suspend() via SET_SYSTEM_SLEEP_PM_OPS.
There are HW workarounds that are performed during this flow and calling
them twice was causing erroneous behavior.
Re-arrange the code to take advantage of common code paths and explicitly
set the individual dev_pm_ops callbacks for suspend, resume, freeze,
thaw, poweroff and restore.
Add a boolean parameter (reset) to the e1000e_down function to allow
for cases when the HW should not be reset when downed during a PM event.
Now that all suspend/shutdown paths result in a call to __e1000_shutdown()
that checks Wake on Lan status, removing redundant check for WoL in
e1000_power_down_phy().
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ertman [Wed, 5 Feb 2014 01:09:54 +0000 (01:09 +0000)]
e1000e: Add missing branding strings in ich8lan.c
Branding strings from recently released and soon to be released
hardware configurations that are supported by e1000e.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ertman [Tue, 4 Feb 2014 01:56:06 +0000 (01:56 +0000)]
e1000e: Cleanup - Update GPL header and Copyright
This patch is to update the GPL header by removing the portion that
refers to the Free Software Foundation address.
Change the copyright date for 2014.
Reformat the header comments to conform to kernel networking coding norms
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ertman [Fri, 24 Jan 2014 23:07:48 +0000 (23:07 +0000)]
e1000e: Fix 82579 sets LPI too early.
Enabling EEE LPI sooner than one second after link up on 82579 causes link
issues with some switches.
Remove EEE enablement for 82579 parts from the link initialization flow to
avoid initializing too early. EEE initialization for 82579 will be done
in e1000e_update_phy_task.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce W Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ertman [Thu, 23 Jan 2014 06:29:13 +0000 (06:29 +0000)]
e1000e: Resolve issues with Management Engine (ME) briefly blocking PHY resets
On a ME enabled system with the cable out, the driver init flow would
generate an erroneous message indicating that resets were being blocked
by an active ME session. Cause was ME clearing the semaphore bit to
block further PHY resets for up to 50 msec during power-on/cycle. After
this interval, ME would re-set the bit and allow PHY resets.
To resolve this, change the flow of e1000e_phy_hw_reset_generic() to
utilize a delay and retry method. Poll the FWSM register to minimize
any extra time added to the flow. If the delay times out at 100ms
(checked in 10msec increments), then return the value E1000_BLK_PHY_RESET,
as this is the accurate state of the PHY. Attempting to alter just the
call to e1000e_phy_hw_reset_generic() in e1000_init_phy_workarounds_pchlan()
just caused the problem to move further down the flow.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce W. Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ertman [Wed, 22 Jan 2014 00:21:41 +0000 (00:21 +0000)]
e1000e: Cleanup unecessary references
Cleaning up some pointer references that are no longer necessary
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Todd Fujinaka [Sat, 18 Jan 2014 06:09:33 +0000 (06:09 +0000)]
e1000e: PTP lock in e1000e_phc_adjustfreq
Add lock in e1000e_phc_adjfreq to prevent concurrent changes to TIMINCA
and SYSTIMH/L.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Aring [Fri, 7 Mar 2014 10:06:54 +0000 (11:06 +0100)]
6lowpan: reassembly: fix return of init function
This patch adds a missing return after fragmentation init. Otherwise we
register a sysctl interface and deregister it afterwards which makes no
sense.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Mar 2014 22:08:57 +0000 (17:08 -0500)]
Merge tag 'linux-can-next-for-3.15-
20140307' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2014-02-12
this is a pull request of twelve patches for net-next/master.
Alexander Shiyan contributes two patches for the mcp251x, one making
the driver more quiet and the other one improves the compile time
coverage by removing the #ifdef CONFIG_PM_SLEEP. Then two patches for
the flexcan driver by me, one removing the #ifdef CONFIG_PM_SLEEP, too,
the other one making use of platform_get_device_id(). Another patch by
me which converts the janz-ican3 driver to use netdev_<level>(). The
remaining 7 patches are by Oliver Hartkopp, they add CAN FD support to
the netlink configuration interface.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Mar 2014 21:24:54 +0000 (16:24 -0500)]
Merge branch 'r8152'
Hayes Wang says:
====================
r8152: tx/rx improvement
- Select the suitable spin lock for each function.
- Add additional check to reduce the spin lock.
- Up the priority of the tx to avoid interrupted by rx.
- Support rx checksum, large send, and IPv6 hw checksum.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:40 +0000 (11:04 +0800)]
r8152: support IPv6
Support hw IPv6 checksum for TCP and UDP packets.
Note that the hw has the limitation of the range of the transport
offset. Besides, the TCP Pseudo Header of the IPv6 TSO of the hw
bases on the Microsoft document which excludes the packet length.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:39 +0000 (11:04 +0800)]
r8152: support TSO
Support scatter gather and TSO.
Adjust the tx checksum function and set the max gso size to fix the
size of the tx aggregation buffer.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:38 +0000 (11:04 +0800)]
r8152: support rx checksum
Support hw rx checksum for TCP and UDP packets.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:37 +0000 (11:04 +0800)]
r8152: calculate the dropped packets for rx
Continue dealing with the remain rx packets, even though the allocation
of the skb fail. This could calculate the correct dropped packets.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:36 +0000 (11:04 +0800)]
r8152: up the priority of the transmission
move the tx_bottom() from delayed_work to tasklet. It makes the rx
and tx balanced. If the device is in runtime suspend when getting
the tx packet, wakeup the device before trasmitting.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:35 +0000 (11:04 +0800)]
r8152: check tx agg list before spin lock
Check tx agg list before spin lock to avoid doing spin lock every
times.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 7 Mar 2014 03:04:34 +0000 (11:04 +0800)]
r8152: replace spin_lock_irqsave and spin_unlock_irqrestore
Use spin_lock and spin_unlock in interrupt context.
The ndo_start_xmit would not be called in interrupt context, so
replace the relative spin_lock_irqsave and spin_unlock_irqrestore
with spin_lock_bh and spin_unlock_bh.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Mar 2014 21:12:43 +0000 (16:12 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e and i40evf.
Most notable are:
Joseph completes the implementation of the ethtool ntuple rule
management interface by adding the get, update and delete interface
reset.
Akeem provides a fix to prevent a possible overflow due to multiplication
of number and size by using kzalloc, so use kcalloc.
Jesse provides an implementation for skb_set_hash() and adds the L4 type
return when we know it is an L4 hash. He also adds a counter to
statistics for Tx timeouts to help users. Lastly he provides a change
to stay away from the cache line where the done bit may be getting
written back for the transmit ring since the hardware may be writing the
whole cache line for a partial update.
Shannon cleans up code comments.
Anjali removes a firmware workaround for newer firmware since the number
of MSIx vectors are being reported correctly.
v2:
- dropped patch 01 of the series based on feedback from the author
Joe Perches and Shannon Nelson.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Mar 2014 21:08:59 +0000 (16:08 -0500)]
Merge tag 'rxrpc-devel-
20140304' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
net-next: AF_RXRPC fixes and development
Here are some AF_RXRPC fixes:
(1) Fix to remove incorrect checksum calculation made during recvmsg(). It's
unnecessary to try to do this there since we check the checksum before
reading the RxRPC header from the packet.
(2) Fix to prevent the sending of an ABORT packet in response to another
ABORT packet and inducing a storm.
(3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we
don't handle the ICMP packet not specifying an MTU size.
And development patches:
(4) Add sysctls for configuring RxRPC parameters, specifically various delays
pertaining to ACK generation, the time before we resend a packet for
which we don't receive an ACK, the maximum time a call is permitted to
live and the amount of time transport, connection and dead call
information is cached.
(5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED
packets, ignoring the MORE_PACKETS flag, delaying the production of
otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production
(barring the call termination) to half a second.
(6) Add more sysctl parameters to expose the Rx window size, the maximum
packet size that we're willing to receive and the number of jumbo rxrpc
packets we're willing to handle in a single UDP packet.
(7) Request ACKs on alternate DATA packets so that the other side doesn't
wait till we fill up the Tx window.
(8) Use a RCU hash table to look up the rxrpc_call for an incoming packet
rather than stepping through a hierarchy involving several spinlocks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Mar 2014 20:57:47 +0000 (15:57 -0500)]
Merge branch 'xen-netback-next'
Zoltan Kiss says:
====================
xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Timeout packets in RX path
9: Aggregate TX unmap operations
v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush
v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.
v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.
v5: Only minor fixes based on Wei's comments
v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.
v7: Small fixes in comments/log messages/error paths, and merging the frag
overflow stats patch into its parent.
[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
====================
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:31 +0000 (21:48 +0000)]
xen-netback: Aggregate TX unmap operations
Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:30 +0000 (21:48 +0000)]
xen-netback: Timeout packets in RX path
A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:29 +0000 (21:48 +0000)]
xen-netback: Handle guests with too many frags
Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy and coalesce the frags into a brand new one and send it to the stack
- unmap the 2 old skb's pages
It's also introduces new stat counters, which help determine how often the guest
sends a packet with more than MAX_SKB_FRAGS frags.
NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise malicious guests can block
other guests by not releasing their sent packets.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:28 +0000 (21:48 +0000)]
xen-netback: Add stat counters for zerocopy
These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.
NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:27 +0000 (21:48 +0000)]
xen-netback: Remove old TX grant copy definitons and fix indentations
These became obsolete with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.
NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:26 +0000 (21:48 +0000)]
xen-netback: Introduce TX grant mapping
This patch introduces grant mapping on netback TX path. It replaces grant copy
operations, ditching grant copy coalescing along the way. Another solution for
copy coalescing is introduced in "xen-netback: Handle guests with too many
frags", older guests and Windows can broke before that patch applies.
There is a callback (xenvif_zerocopy_callback) from core stack to release the
slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a
separate dealloc thread, as scheduling NAPI instance from there is inefficient,
therefore we can't do dealloc from the instance.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:25 +0000 (21:48 +0000)]
xen-netback: Handle foreign mapped pages on the guest RX path
RX path need to know if the SKB fragments are stored on pages from another
domain.
Logically this patch should be after introducing the grant mapping itself, as
it makes sense only after that. But to keep bisectability, I moved it here. It
shouldn't change any functionality here. xenvif_zerocopy_callback and
ubuf_to_vif are just stubs here, they will be introduced properly later on.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:24 +0000 (21:48 +0000)]
xen-netback: Minor refactoring of netback code
This patch contains a few bits of refactoring before introducing the grant
mapping changes:
- introducing xenvif_tx_pending_slots_available(), as this is used several
times, and will be used more often
- rename the thread to vifX.Y-guest-rx, to signify it does RX work from the
guest point of view
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zoltan Kiss [Thu, 6 Mar 2014 21:48:23 +0000 (21:48 +0000)]
xen-netback: Use skb->cb for pending_idx
Storing the pending_idx at the first byte of the linear buffer never looked
good, skb->cb is a more proper place for this. It also prevents the header to
be directly grant copied there, and we don't have the pending_idx after we
copied the header here, so it's time to change it.
It also introduces helpers for the RX side
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 6 Mar 2014 02:19:34 +0000 (18:19 -0800)]
l2tp: keep original skb ownership
There is no reason to orphan skb in l2tp.
This breaks things like per socket memory limits, TCP Small queues...
Fix this before more people copy/paste it.
This is very similar to commit
8f646c922d550
("vxlan: keep original skb ownership")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 5 Mar 2014 22:08:38 +0000 (14:08 -0800)]
tcp: do not leak non zero tstamp in output packets
Usage of skb->tstamp should remain private to TCP stack
(only set on packets on write queue, not on cloned ones)
Otherwise, packets given to loopback interface with a non null tstamp
can confuse netif_rx() / net_timestamp_check()
Other possibility would be to clear tstamp in loopback_xmit(),
as done in skb_scrub_packet()
Fixes:
740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:25 +0000 (16:36 +0100)]
can: add bittiming check at interface open for CAN FD
Additionally to have the second (data) bitrate available the data bitrate
has to be greater or equal to the arbitration bitrate in CAN FD.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:24 +0000 (16:36 +0100)]
can: allow to change the device mtu for CAN FD capable devices
The configuration for CAN FD depends on CAN_CTRLMODE_FD enabled in the driver
specific ctrlmode_supported capabilities.
The configuration can be done either with the 'fd { on | off }' option in the
'ip' tool from iproute2 or by setting the CAN netdevice MTU to CAN_MTU (16) or
to CANFD_MTU (72).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:23 +0000 (16:36 +0100)]
can: introduce the data bitrate configuration for CAN FD
As CAN FD offers a second bitrate for the data section of the CAN frame the
infrastructure for storing and configuring this second bitrate is introduced.
Improved the readability of the if-statement by inserting some newlines.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:22 +0000 (16:36 +0100)]
can: provide a separate bittiming_const parameter to bittiming functions
As the bittiming calculation functions are to be used with different
bittiming_const structures for CAN and CAN FD the direct reference to
priv->bittiming_const inside these functions has to be removed.
Also moved the check for existing bittiming const to one place.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:21 +0000 (16:36 +0100)]
can: move sanity check for bitrate and tq into can_get_bittiming
This patch moves a sanity check in order to have a second user for CAN FD.
Also simplify the return value generation in can_get_bittiming() as only
correct return values of can_[calc|fixup]_bittiming() lead to a return value of
zero.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:20 +0000 (16:36 +0100)]
can: only send bitrate data via netlink when available
When setting the bitrate both can_calc_bittiming() and can_fixup_bittiming()
lead to the bitrate variable to be set, when a proper bit timing is available.
Only then the bitrate configuration is stored for the device, so checking for
priv->bittiming.bitrate is always sufficient.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 28 Feb 2014 15:36:19 +0000 (16:36 +0100)]
can: preserve skbuff protocol in can_put_echo_skb
The skbuff protocol value was formerly fixed/sanitized to ETH_P_CAN in
can_put_echo_skb(). With CAN FD this value has to be preserved.
This patch changes the hard assignment of the protocol value to a check of
valid protocol values for CAN and CAN FD.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Tue, 4 Mar 2014 21:00:47 +0000 (22:00 +0100)]
can: janz-ican3: convert dev_<level> printing to netdev_<level>
This patch converts the dev_<level> printing to netdev_<level>, this makes it
possible to remove the "struct device *dev" pointer from the "struct
ican3_dev".
Cc: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Catherine Sullivan [Thu, 6 Feb 2014 05:51:14 +0000 (05:51 +0000)]
i40e/i40evf: Bump pf&vf build versions
Bump i40e to 0.3.34 and i40evf to 0.9.14.
Change-ID: I6b3fb8ccf55b128d2baa4bdc20d3911ec81d4a5b
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 6 Feb 2014 05:51:13 +0000 (05:51 +0000)]
i40e/i40evf: carefully fill tx ring
We need to make sure that we stay away from the cache line
where the DD bit (done) may be getting written back for
the transmit ring since the hardware may be writing the
whole cache line for a partial update.
Change-ID: Id0b6dfc01f654def6a2a021af185803be1915d7e
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 6 Feb 2014 05:51:12 +0000 (05:51 +0000)]
i40e: fix nvm version and remove firmware report
The driver needs to use the format that the current NVM
uses when printing the version of the NVM. It should remain
this way from now on forward.
The driver was reporting when firmware was less than
an expected version number, but this is not a requirement
for the product and we print the firmware number at
init and in ethtool -i output. Just remove the print.
Change-ID: Ide0b856cd454ebf867610ef9a0d639bb358a4a60
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Neerav Parikh [Thu, 6 Feb 2014 05:51:11 +0000 (05:51 +0000)]
i40e: Fix static checker warning
This patch fixes the following static checker warning:
drivers/net/ethernet/intel/i40e/i40e_dcb.c:342
i40e_lldp_to_dcb_config() warn: 'tlv' can't be NULL.
Exit criteria from the while loop is encountering LLDP END
LV or if the TLV length goes beyond the buffer length.
Change-ID: I7548b16db90230ec2ba0fa791b0343ca8b7dd5bb
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 6 Feb 2014 05:51:10 +0000 (05:51 +0000)]
i40e: Remove a redundant filter addition
Remove a redundant filter addition to stop FW complaints about a redundant
filter removal.
Change-ID: I22bef6b682bd8d43432557e6e2b3e73ffb27b985
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Thu, 6 Feb 2014 05:51:09 +0000 (05:51 +0000)]
i40e: count timeout events
The ethtool -S statistics should have a counter for
tx timeouts in order to better help inform the masses.
Change-ID: Ice4b20ed4a151509f366719ab105be49c9e7b2b4
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 6 Feb 2014 05:51:08 +0000 (05:51 +0000)]
i40e: Remove a FW workaround for Number of MSIX vectors
The Number of MSIX vectors being reported is correct and hence
we need a check to do the right thing for FWs before and after.
Change-ID: I50902d1c848adcb960ea49ac73f7865ca871a1c3
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 6 Feb 2014 05:51:06 +0000 (05:51 +0000)]
i40e: clean up comment style
Lots of trivial changes to remove double spaces in function headers,
unnecessary periods in short comments, and adjust the English usage here
and there.
No actual code was harmed in the making of this patch.
Change-ID: I6e756c500756945e81a61ffb10221753eb7923ea
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 12 Feb 2014 01:45:33 +0000 (01:45 +0000)]
i40e/i40evf: i40e implementation for skb_set_hash
Original comment from Tom Herbert <therbert@google.com>
Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
This patch builds upon Tom's original implementation and adds
the L4 type return when we know it is an L4 hash.
This requires use of the ptype decoder ring, so enable it.
Change-ID: I2f9fa86d1a6add58cff13386f7f4238b1abcc468
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Akeem G Abodunrin [Thu, 6 Feb 2014 05:51:02 +0000 (05:51 +0000)]
i40e: Prevent overflow due to kzalloc
To prevent the possibility of overflow due multiplication of number and size
use kcalloc instead of kzalloc.
Change-ID: Ibe4d81ed7d9738d3bbe66ee4844ff9be817e8080
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Joseph Gasparakis [Wed, 12 Feb 2014 01:45:30 +0000 (01:45 +0000)]
i40e: Flow Director sideband accounting
This patch completes implementation of the ethtool ntuple
rule management interface. It adds the get, update and delete
interface reset.
Change-ID: Ida7f481d9ee4e405ed91340b858eabb18a52fdb5
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 30 Jan 2014 12:40:27 +0000 (12:40 +0000)]
i40evf: Enable the ndo_set_features netdev op
Set netdev->hw_features to enable the ndo_set_features netdev op.
Change-Id: I5a086fbfa5a089de5adba2800c4d0b3a73747b11
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
stephen hemminger [Thu, 6 Mar 2014 22:20:17 +0000 (14:20 -0800)]
bonding: fix const in options processing
This is a fixup patch to resolve issues with const from my earlier patch.
Make all the setter functions use const on input parameter.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 5 Mar 2014 18:14:34 +0000 (10:14 -0800)]
net_sched: htb: do not acquire qdisc lock in dump operations
htb_dump() and htb_dump_class() do not strictly need to acquire
qdisc lock to fetch qdisc and/or class parameters.
We hold RTNL and no changes can occur.
This reduces by 50% qdisc lock pressure while doing tc qdisc|class dump
operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Mar 2014 22:21:43 +0000 (17:21 -0500)]
Merge branch '6lowpan'
Alexander Aring says:
====================
6lowpan: header cleanup
this patch series fix a missing include of 6LoWPAN header and move it
into the include/net directory. Since we did some code sharing with
bluetooth 6LoWPAN the header turns into a generic header for 6LoWPAN.
Instead to use a relative path in bluetooth 6LoWPAN we can now use
include <net/6lowpan.h>.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Wed, 5 Mar 2014 13:29:05 +0000 (14:29 +0100)]
6lowpan: move 6lowpan header to include/net
This header is used by bluetooth and ieee802154 branch. This patch
move this header to the include/net directory to avoid a use of a
relative path in include.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Wed, 5 Mar 2014 13:29:04 +0000 (14:29 +0100)]
6lowpan: add missing include of net/ipv6.h
The 6lowpan.h file contains some static inline function which use
internal ipv6 api structs. Add a include of ipv6.h to be sure that it's
known before.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Wed, 5 Mar 2014 10:54:05 +0000 (11:54 +0100)]
be2net: do external loopback test only when it is requested
v2: remove unnecessary braces from all 'loopback' if-blocks (thx Sergei)
Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Cc: sergei.shtylyov@cogentembedded.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fengguang Wu [Wed, 5 Mar 2014 09:57:23 +0000 (11:57 +0200)]
net/mlx4_en: mlx4_en_verify_params() can be static
Fix static error introduced by commit:
b97b33a3df0439401f80f041eda507d4fffa0dbf [645/653] net/mlx4_en: Verify
mlx4_en module parameters
sparse warnings:
drivers/net/ethernet/mellanox/mlx4/en_main.c:335:6: sparse: symbol
'mlx4_en_verify_params' was not declared. Should it be static?
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde [Tue, 4 Mar 2014 21:04:22 +0000 (22:04 +0100)]
can: flexcan: make use of platform_get_device_id()
This patch replaces an open coded pdev->id_entry by platform_get_device_id().
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 5 Mar 2014 18:10:44 +0000 (19:10 +0100)]
can: flexcan: Remove #ifdef CONFIG_PM_SLEEP
This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexander Shiyan [Sat, 22 Feb 2014 05:51:14 +0000 (09:51 +0400)]
can: mcp251x: Remove #ifdef CONFIG_PM_SLEEP
This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexander Shiyan [Wed, 5 Mar 2014 17:31:56 +0000 (21:31 +0400)]
can: mcp251x: Make driver more quiet
This patch moves one diagnostic message used for debugging purposes
to dev_dbg() and removes one useless message.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
stephen hemminger [Wed, 5 Mar 2014 00:36:44 +0000 (16:36 -0800)]
bonding: options handling cleanup
Make local functions static (ie. only used in bond_options.c)
Make bond options parsing tables constant.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 5 Mar 2014 00:34:25 +0000 (16:34 -0800)]
bonding: remove dead code
These functions are defined but no longer used.
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Mar 2014 20:03:17 +0000 (15:03 -0500)]
tcp: Use NET_ADD_STATS instead of NET_ADD_STATS_BH in tcp_event_new_data_sent()
Can be invoked from non-BH context.
Based upon a patch by Eric Dumazet.
Fixes:
f19c29e3e391 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 6 Mar 2014 14:33:22 +0000 (15:33 +0100)]
bonding: make slave status notifications GFP_ATOMIC
Currently we're using GFP_KERNEL, however there are some path(s) where we
can hold some spinlocks, specifically bond->curr_slave_lock:
[ 4.722916] BUG: sleeping function called from invalid context at mm/slub.c:965
[ 4.724438] in_atomic(): 1, irqs_disabled(): 0, pid: 940, name: ifup-eth
[ 4.726034] 5 locks held by ifup-eth/940:
...snip...
[ 4.734646] #4: (&bond->curr_slave_lock){+...+.}, at: [<
ffffffffa00badc6>] bond_enslave+0xda6/0xdd0 [bonding]
...snip...
[ 4.759081] [<
ffffffffa00b6f11>] bond_change_active_slave+0x191/0x3b0 [bonding]
[ 4.760917] [<
ffffffffa00b7227>] bond_select_active_slave+0xf7/0x1d0 [bonding]
[ 4.762751] [<
ffffffffa00badce>] bond_enslave+0xdae/0xdd0 [bonding]
...snip...
As it's out of hot path and is a really rare event - change the gfp_t flags
to GFP_ATOMIC to avoid sleeping under spinlock.
v2: convert new notify calls to GFP_ATOMIC.
CC: Thomas Glanzmann <thomas@glanzmann.de>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Thu, 6 Mar 2014 08:11:07 +0000 (09:11 +0100)]
inet: remove now unused flag DST_NOPEER
Commit
e688a604807647 ("net: introduce DST_NOPEER dst flag") introduced
DST_NOPEER because because of crashes in ipv6_select_ident called from
udp6_ufo_fragment.
Since commit
916e4cf46d0204 ("ipv6: reuse ip6_frag_id from
ip6_ufo_append_data") we don't call ipv6_select_ident any more from
ip6_ufo_append_data, thus this flag lost its purpose and can be removed.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Mar 2014 18:15:17 +0000 (13:15 -0500)]
Merge branch 'r8152'
Hayes Wang says:
====================
r8152: cleanups
Deal with some empty lines and spaces, replace some tp->netdev with netdev,
and remove the unnecessary function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 6 Mar 2014 07:07:18 +0000 (15:07 +0800)]
r8152: remove rtl8152_get_stats
The rtl8152_get_stats() returns the point address of the struct
net_device_stats. This could be got from struct net_device directly.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 6 Mar 2014 07:07:17 +0000 (15:07 +0800)]
r8152: replace tp->netdev with netdev
Replace some tp->netdev with netdev.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 6 Mar 2014 07:07:16 +0000 (15:07 +0800)]
r8152: deal with the empty line and space
Add or remove some empty lines. Replace the spaces with the tabs.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Mar 2014 01:32:02 +0000 (20:32 -0500)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Tue, 4 Mar 2014 18:01:18 +0000 (19:01 +0100)]
ieee802154: fix whitespace issues in Kconfig
This patch fixes some whitespace issues in Kconfig files of IEEE
802.15.4 subsytem.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Tue, 4 Mar 2014 18:01:17 +0000 (19:01 +0100)]
at86rf230: add help and 212 to Kconfig menu entry
Since commit
8fad346f366a72978ea942abd06bd501ebd39c22
(ieee802154: add basic support for RF212 to at86rf230 driver)
we support at86rf212 as well.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 4 Mar 2014 06:44:38 +0000 (12:14 +0530)]
be2net: dma_sync each RX frag before passing it to the stack
The driver currently maps a page for DMA, divides the page into multiple
frags and posts them to the HW. It un-maps the page after data is received
on all the frags of the page. This scheme doesn't work when bounce buffers
are used for DMA (swiotlb=force kernel param).
This patch fixes this problem by calling dma_sync_single_for_cpu() for each
frag (excepting the last one) so that the data is copied from the bounce
buffers. The page is un-mapped only when DMA finishes on the last frag of
the page.
(Thanks Ben H. for suggesting the dma_sync API!)
This patch also renames the "last_page_user" field of be_rx_page_info{}
struct to "last_frag" to improve readability of the fixed code.
Reported-by: Li Fengmao <li.fengmao@zte.com.cn>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Mar 2014 18:51:13 +0000 (13:51 -0500)]
Merge branch 'mpls_tc'
Simon Wunderlich says:
====================
this series contains a header file proposal for MPLS labels. These
labels do not seem to be properly defined in the kernel so far. We are
developing a wired/wireless 802.21/MPLS switch and need to check the
MPLS labels to use the traffic control info for transmissions over
802.11 networks.
Changes to third version:
* rename mpls_label_stack to mpls_label (thanks Neil)
* fix over-indendented closing brac (thanks Sergei)
* add Johannes' Ack
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich [Mon, 3 Mar 2014 16:23:12 +0000 (17:23 +0100)]
cfg80211: add MPLS and 802.21 classification
MPLS labels may contain traffic control information, which should be
evaluated and used by the wireless subsystem if present.
Also check for IEEE 802.21 which is always network control traffic.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich [Mon, 3 Mar 2014 16:23:11 +0000 (17:23 +0100)]
UAPI: add MPLS label stack definition
Labels for the Multiprotocol Label Switching are defined in RFC 3032
which was superseded by RFC 5462. Add the definition to UAPI and a stub
header for include/linux.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich [Mon, 3 Mar 2014 16:23:10 +0000 (17:23 +0100)]
if_ether.h: add IEEE 802.21 Ethertype
Add the Ethertype for IEEE Std 802.21 - Media Independent Handover
Protocol. This Ethertype is used for network control messages.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 4 Mar 2014 16:44:32 +0000 (08:44 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix memory leak in ieee80211_prep_connection(), sta_info leaked on
error. From Eytan Lifshitz.
2) Unintentional switch case fallthrough in nft_reject_inet_eval(),
from Patrick McHardy.
3) Must check if payload lenth is a power of 2 in
nft_payload_select_ops(), from Nikolay Aleksandrov.
4) Fix mis-checksumming in xen-netfront driver, ip_hdr() is not in the
correct place when we invoke skb_checksum_setup(). From Wei Liu.
5) TUN driver should not advertise HW vlan offload features in
vlan_features. Fix from Fernando Luis Vazquez Cao.
6) IPV6_VTI needs to select NET_IPV_TUNNEL to avoid build errors, fix
from Steffen Klassert.
7) Add missing locking in xfrm_migrade_state_find(), we must hold the
per-namespace xfrm_state_lock while traversing the lists. Fix from
Steffen Klassert.
8) Missing locking in ath9k driver, access to tid->sched must be done
under ath_txq_lock(). Fix from Stanislaw Gruszka.
9) Fix two bugs in TCP fastopen. First respect the size argument given
to tcp_sendmsg() in the fastopen path, and secondly prevent
tcp_send_syn_data() from potentially using order-5 allocations.
From Eric Dumazet.
10) Fix handling of default neigh garbage collection params, from Jiri
Pirko.
11) Fix cwnd bloat and over-inflation of RTT when transmit segmentation
is in use. From Eric Dumazet.
12) Missing initialization of Realtek r8169 driver's statistics
seqlocks. Fix from Kyle McMartin.
13) Fix RTNL assertion failures in 802.3ad and AB ARP monitor of bonding
driver, from Ding Tianhong.
14) Bonding slave release race can cause divide by zero, fix from
Nikolay Aleksandrov.
15) Overzealous return from neigh_periodic_work() causes reachability
time to not be computed. Fix from Duain Jiong.
16) Fix regression in ipv6_find_hdr(), it should not return -ENOENT when
a specific target is specified and found. From Hans Schillstrom.
17) Fix VLAN tag stripping regression in BNA driver, from Ivan Vecera.
18) Tail loss probe can calculate bogus RTTs due to missing packet
marking on retransmit. Fix from Yuchung Cheng.
19) We cannot do skb_dst_drop() in iptunnel_pull_header() because
multicast loopback detection in later code paths need access to
skb_rtable(). Fix from Xin Long.
20) The macvlan driver regresses in that it propagates lower device
offload support disables into itself, causing severe slowdowns when
running over a bridge. Provide the software offloads always on
macvlan devices to deal with this and the regression is gone. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
macvlan: Add support for 'always_on' offload features
net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable
ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer
net: cpsw: fix cpdma rx descriptor leak on down interface
be2net: isolate TX workarounds not applicable to Skyhawk-R
be2net: Fix skb double free in be_xmit_wrokarounds() failure path
be2net: clear promiscuous bits in adapter->flags while disabling promiscuous mode
be2net: Fix to reset transparent vlan tagging
qlcnic: dcb: a couple off by one bugs
tcp: fix bogus RTT on special retransmission
hsr: off by one sanity check in hsr_register_frame_in()
can: remove CAN FD compatibility for CAN 2.0 sockets
can: flexcan: factor out soft reset into seperate funtion
can: flexcan: flexcan_remove(): add missing netif_napi_del()
can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze
can: flexcan: factor out transceiver {en,dis}able into seperate functions
can: flexcan: fix transition from and to low power mode in chip_{en,dis}able
can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails
can: flexcan: fix shutdown: first disable chip, then all interrupts
USB AX88179/178A: Support D-Link DUB-1312
...
Linus Torvalds [Tue, 4 Mar 2014 16:41:42 +0000 (08:41 -0800)]
Merge tag 'regulator-v3.14-rc5' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes here which ensure that regulators using the core
support for GPIO enables work in all cases by ensuring that helpers
are used consistently rather than open coding in places and hence not
having GPIO support in some of them"
* tag 'regulator-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Replace direct ops->disable usage
regulator: core: Replace direct ops->enable usage
Linus Torvalds [Tue, 4 Mar 2014 16:29:39 +0000 (08:29 -0800)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton akpm@linux-foundation.org>:
mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness
mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
MAINTAINERS: add and correct types of some "T:" entries
MAINTAINERS: use tab for separator
rapidio/tsi721: fix tasklet termination in dma channel release
hfsplus: fix remount issue
zram: avoid null access when fail to alloc meta
sh: prefix sh-specific "CCR" and "CCR2" by "SH_"
ocfs2: fix quota file corruption
drivers/rtc/rtc-s3c.c: fix incorrect way of save/restore of S3C2410_TICNT for TYPE_S3C64XX
kallsyms: fix absolute addresses for kASLR
scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression
mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
memcg: reparent charges of children before processing parent
memcg: fix endless loop in __mem_cgroup_iter_next()
lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
dma debug: account for cachelines and read-only mappings in overlap tracking
mm: close PageTail race
MAINTAINERS: EDAC: add Mauro and Borislav as interim patch collectors
Johannes Weiner [Mon, 3 Mar 2014 23:38:41 +0000 (15:38 -0800)]
mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness
Jan Stancek reports manual page migration encountering allocation
failures after some pages when there is still plenty of memory free, and
bisected the problem down to commit
81c0a2bb515f ("mm: page_alloc: fair
zone allocator policy").
The problem is that GFP_THISNODE obeys the zone fairness allocation
batches on one hand, but doesn't reset them and wake kswapd on the other
hand. After a few of those allocations, the batches are exhausted and
the allocations fail.
Fixing this means either having GFP_THISNODE wake up kswapd, or
GFP_THISNODE not participating in zone fairness at all. The latter
seems safer as an acute bugfix, we can clean up later.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Ping Fan [Mon, 3 Mar 2014 23:38:39 +0000 (15:38 -0800)]
mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
When doing some numa tests on powerpc, I triggered an oops bug. I find
it is caused by using page->_last_cpupid. It should be initialized as
"-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(),
we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And
finally cause an oops bug in task_numa_group(), since the online cpu is
less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled
Call trace:
SMP NR_CPUS=64 NUMA PowerNV
Modules linked in:
CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
task:
c000001e2746aa80 ti:
c000001e32c50000 task.ti:
c000001e32c50000
REGS:
c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+)
MSR:
9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:
28024424 XER:
20000000
CFAR:
c000000000009324 DAR:
7265717569726857 DSISR:
40000000 SOFTE: 1
NIP .task_numa_fault+0x1470/0x2370
LR .task_numa_fault+0x1468/0x2370
Call Trace:
.task_numa_fault+0x1468/0x2370 (unreliable)
.do_numa_page+0x480/0x4a0
.handle_mm_fault+0x4ec/0xc90
.do_page_fault+0x3a8/0x890
handle_page_fault+0x10/0x30
Instruction dump:
3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
3884b138 48d9cfd5 60000000 e93f0100 <
812902e4>
7d2907b45529063e 7d2a07b4
---[ end trace
15f2510da5ae07cf ]---
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 3 Mar 2014 23:38:38 +0000 (15:38 -0800)]
MAINTAINERS: add and correct types of some "T:" entries
Tree location entries should start with the appropriate type.
Add git to some, hg to another.
Neaten tree type description.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 3 Mar 2014 23:38:37 +0000 (15:38 -0800)]
MAINTAINERS: use tab for separator
Convert whitespace to single tab for separators.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Mon, 3 Mar 2014 23:38:36 +0000 (15:38 -0800)]
rapidio/tsi721: fix tasklet termination in dma channel release
This patch is a modification of the patch originally proposed by
Xiaotian Feng <xtfeng@gmail.com>: https://lkml.org/lkml/2012/11/5/413
This new version disables DMA channel interrupts and ensures that the
tasklet wil not be scheduled again before calling tasklet_kill().
Unfortunately the updated patch was not released at that time due to
planned rework of Tsi721 mport driver to use threaded interrupts (which
has yet to happen). Recently the issue was reported again:
https://lkml.org/lkml/2014/2/19/762.
Description from the original Xiaotian's patch:
"Some drivers use tasklet_disable in device remove/release process,
tasklet_disable will inc tasklet->count and return. If the tasklet is
not handled yet under some softirq pressure, the tasklet will be
placed on the tasklet_vec, never have a chance to be excuted. This
might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq,
but tasklet is disabled. tasklet_kill should be used in this case."
This patch is applicable to kernel versions starting from v3.5.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Mon, 3 Mar 2014 23:38:35 +0000 (15:38 -0800)]
hfsplus: fix remount issue
Current implementation of HFS+ driver has small issue with remount
option. Namely, for example, you are unable to remount from RO mode
into RW mode by means of command "mount -o remount,rw /dev/loop0
/mnt/hfsplus". Trying to execute sequence of commands results in an
error message:
mount /dev/loop0 /mnt/hfsplus
mount -o remount,ro /dev/loop0 /mnt/hfsplus
mount -o remount,rw /dev/loop0 /mnt/hfsplus
mount: you must specify the filesystem type
mount -t hfsplus -o remount,rw /dev/loop0 /mnt/hfsplus
mount: /mnt/hfsplus not mounted or bad option
The reason of such issue is failure of mount syscall:
mount("/dev/loop0", "/mnt/hfsplus", 0x2282a60, MS_MGC_VAL|MS_REMOUNT, NULL) = -1 EINVAL (Invalid argument)
Namely, hfsplus_parse_options_remount() method receives empty "input"
argument and return false in such case. As a result, hfsplus_remount()
returns -EINVAL error code.
This patch fixes the issue by means of return true for the case of empty
"input" argument in hfsplus_parse_options_remount() method.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Mon, 3 Mar 2014 23:38:34 +0000 (15:38 -0800)]
zram: avoid null access when fail to alloc meta
zram_meta_alloc could fail so caller should check it. Otherwise, your
system will hang.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Mon, 3 Mar 2014 23:38:33 +0000 (15:38 -0800)]
sh: prefix sh-specific "CCR" and "CCR2" by "SH_"
Commit
bcf24e1daa94 ("mmc: omap_hsmmc: use the generic config for
omap2plus devices"), enabled the build for other platforms for compile
testing.
sh-allmodconfig now fails with:
include/linux/omap-dma.h:171:8: error: expected identifier before numeric constant
make[4]: *** [drivers/mmc/host/omap_hsmmc.o] Error 1
This happens because SuperH #defines "CCR", which is one of the enum
values in include/linux/omap-dma.h. There's a similar issue with "CCR2"
on sh2a.
As "CCR" and "CCR2" are too generic names for global #defines, prefix
them with "SH_" to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Mon, 3 Mar 2014 23:38:32 +0000 (15:38 -0800)]
ocfs2: fix quota file corruption
Global quota files are accessed from different nodes. Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.
Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vikas Sajjan [Mon, 3 Mar 2014 23:38:31 +0000 (15:38 -0800)]
drivers/rtc/rtc-s3c.c: fix incorrect way of save/restore of S3C2410_TICNT for TYPE_S3C64XX
On exynos5250, exynos5420 and exynos5260 it was observed that, after 1
cycle of S2R, the rtc-tick occurs at a very fast rate as compared to the
rtc-tick occuring before S2R.
This patch fixes the above issue by correcting the wrong way of
save/restore of S3C2410_TICNT for TYPE_S3C64XX.
Signed-off-by: Vikas Sajjan <vikas.sajjan@samsung.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Honig [Mon, 3 Mar 2014 23:38:30 +0000 (15:38 -0800)]
kallsyms: fix absolute addresses for kASLR
Currently symbols that are absolute addresses are incorrectly displayed
in /proc/kallsyms if the kernel is loaded with kASLR.
The problem was that the scripts/kallsyms.c file which generates the
array of symbol names and addresses uses an relocatable value for all
symbols, even absolute symbols. This patch fixes that.
Several kallsyms output in different boot states for comparison:
$ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.nokaslr
0000000000000000 D __per_cpu_start
0000000000014280 D __per_cpu_end
ffffffff810001c8 T _stext
$ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr1
000000001f200000 D __per_cpu_start
000000001f214280 D __per_cpu_end
ffffffffa02001c8 T _stext
$ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr2
000000000d400000 D __per_cpu_start
000000000d414280 D __per_cpu_end
ffffffff8e4001c8 T _stext
$ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr-fixed
0000000000000000 D __per_cpu_start
0000000000014280 D __per_cpu_end
ffffffffadc001c8 T _stext
Signed-off-by: Andy Honig <ahonig@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel M. Weeks [Mon, 3 Mar 2014 23:38:28 +0000 (15:38 -0800)]
scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression
LZ4 as implemented in the kernel differs from the default method now
used by the reference implementation of LZ4. Until the in-kernel method
is updated to support the new default, passing the legacy flag (-l) to
the compressor is necessary. Without this flag the kernel-generated,
LZ4-compressed initramfs is junk.
Kyungsik said:
: It seems that lz4 supports legacy format with the same option as lz4c
: does. Just looking at the first few bytes of lz4 compressed image, we can
: see whether it is new format or not.
:
: It shows new format magic number without this patch. New format magic
: number is 0x184d2204.
:
: $ hexdump -C ./initramfs_data.cpio.lz4 |more
:
00000000 04 22 4d 18 64 70 b9 69 (Little Endian)
: ...
:
: Currently kernel supports legacy format only.
Signed-off-by: Daniel M. Weeks <dan@danweeks.net>
Cc: Michal Marek <mmarek@suse.cz>
Acked-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Mon, 3 Mar 2014 23:38:27 +0000 (15:38 -0800)]
mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
Daniel Borkmann reported a VM_BUG_ON assertion failing:
------------[ cut here ]------------
kernel BUG at mm/mlock.c:528!
invalid opcode: 0000 [#1] SMP
Modules linked in: ccm arc4 iwldvm [...]
video
CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
task:
ffff8801f87f9820 ti:
ffff88002cb44000 task.ti:
ffff88002cb44000
RIP: 0010:[<
ffffffff81171ad0>] [<
ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
Call Trace:
do_munmap+0x18f/0x3b0
vm_munmap+0x41/0x60
SyS_munmap+0x22/0x30
system_call_fastpath+0x1a/0x1f
RIP munlock_vma_pages_range+0x2e0/0x2f0
---[ end trace
a0088dcf07ae10f2 ]---
because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page. This can be reproduced with default config since 3.11
kernels. A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.
The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.
The checks for THP in munlock came with commit
ff6a6da60b89 ("mm:
accelerate munlock() treatment of THP pages"), i.e. since 3.9, but did
not trigger a bug. It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page. This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.
Since commit
7225522bb429 ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.
This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable. The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway. Related Lkml discussion can be
found in [2].
[1] tools/testing/selftests/net/psock_tpacket
[2] https://lkml.org/lkml/2014/1/10/427
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filipe Brandenburger [Mon, 3 Mar 2014 23:38:25 +0000 (15:38 -0800)]
memcg: reparent charges of children before processing parent
Sometimes the cleanup after memcg hierarchy testing gets stuck in
mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
There may turn out to be several causes, but a major cause is this: the
workitem to offline parent can get run before workitem to offline child;
parent's mem_cgroup_reparent_charges() circles around waiting for the
child's pages to be reparented to its lrus, but it's holding
cgroup_mutex which prevents the child from reaching its
mem_cgroup_reparent_charges().
Further testing showed that an ordered workqueue for cgroup_destroy_wq
is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
stage on the way can mess up the order before reaching the workqueue.
Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
all its children (and grandchildren, in the correct order) to have their
charges reparented first.
Fixes:
e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction")
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [v3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 3 Mar 2014 23:38:24 +0000 (15:38 -0800)]
memcg: fix endless loop in __mem_cgroup_iter_next()
Commit
0eef615665ed ("memcg: fix css reference leak and endless loop in
mem_cgroup_iter") got the interaction with the commit a few before it
d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully
initialized") slightly wrong, and we didn't notice at the time.
It's elusive, and harder to get than the original, but for a couple of
days before rc1, I several times saw a endless loop similar to that
supposedly being fixed.
This time it was a tighter loop in __mem_cgroup_iter_next(): because we
can get here when our root has already been offlined, and the ordering
of conditions was such that we then just cycled around forever.
Fixes:
0eef615665ed ("memcg: fix css reference leak and endless loop in mem_cgroup_iter").
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 3 Mar 2014 23:38:23 +0000 (15:38 -0800)]
lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of
BUG: sleeping function called from invalid context at kernel/fork.c:606
in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
1 lock held by swapoff/1394:
#0: (rcu_read_lock){.+.+.+}, at: [<
ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
followed by
================================================
[ BUG: lock held when returning to user space! ]
3.14.0-rc1 #3 Not tainted
------------------------------------------------
swapoff/1394 is leaving the kernel with locks still held!
1 lock held by swapoff/1394:
#0: (rcu_read_lock){.+.+.+}, at: [<
ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
after which the system recovered nicely.
Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.
Fixes
e504f3fdd63d ("tmpfs radix_tree: locate_item to speed up swapoff")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Mon, 3 Mar 2014 23:38:21 +0000 (15:38 -0800)]
dma debug: account for cachelines and read-only mappings in overlap tracking
While debug_dma_assert_idle() checks if a given *page* is actively
undergoing dma the valid granularity of a dma mapping is a *cacheline*.
Sander's testing shows that the warning message "DMA-API: exceeded 7
overlapping mappings of pfn..." is falsely triggering. The test is
simply mapping multiple cachelines in a given page.
Ultimately we want overlap tracking to be valid as it is a real api
violation, so we need to track active mappings by cachelines. Update
the active dma tracking to use the page-frame-relative cacheline of the
mapping as the key, and update debug_dma_assert_idle() to check for all
possible mapped cachelines for a given page.
However, the need to track active mappings is only relevant when the
dma-mapping is writable by the device. In fact it is fairly standard
for read-only mappings to have hundreds or thousands of overlapping
mappings at once. Limiting the overlap tracking to writable
(!DMA_TO_DEVICE) eliminates this class of false-positive overlap
reports.
Note, the radix gang lookup is sub-optimal. It would be best if it
stopped fetching entries once the search passed a page boundary.
Nevertheless, this implementation does not perturb the original net_dma
failing case. That is to say the extra overhead does not show up in
terms of making the failing case pass due to a timing change.
References:
http://marc.info/?l=linux-netdev&m=
139232263419315&w=2
http://marc.info/?l=linux-netdev&m=
139217088107122&w=2
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Mon, 3 Mar 2014 23:38:18 +0000 (15:38 -0800)]
mm: close PageTail race
Commit
bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).
This results in a very rare NULL pointer dereference on the
aforementioned page_count(page). Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.
This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation. This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling. The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.
This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.
Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>