Tobias Klauser [Thu, 31 Jul 2014 10:17:08 +0000 (12:17 +0200)]
netlink: Use PAGE_ALIGNED macro
Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE).
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duan Jiong [Thu, 31 Jul 2014 09:54:32 +0000 (17:54 +0800)]
net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS
When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
and icmpv6_notify() do some valid checks on packet's length, but then some
protocols check packet's length redaudantly. So remove those duplicated
statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
function icmp_socket_deliver() and icmpv6_notify() respectively.
In addition, add missed counter in udp6/udplite6 when socket is NULL.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe [Wed, 30 Jul 2014 18:40:53 +0000 (12:40 -0600)]
sctp: Fixup v4mapped behaviour to comply with Sock API
The SCTP socket extensions API document describes the v4mapping option as
follows:
8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
This socket option is a Boolean flag which turns on or off the
mapping of IPv4 addresses. If this option is turned on, then IPv4
addresses will be mapped to V6 representation. If this option is
turned off, then no mapping will be done of V4 addresses and a user
will receive both PF_INET6 and PF_INET type addresses on the socket.
See [RFC3542] for more details on mapped V6 addresses.
This description isn't really in line with what the code does though.
Introduce addr_to_user (renamed addr_v4map), which should be called
before any sockaddr is passed back to user space. The new function
places the sockaddr into the correct format depending on the
SCTP_I_WANT_MAPPED_V4_ADDR option.
Audit all places that touched v4mapped and either sanely construct
a v4 or v6 address then call addr_to_user, or drop the
unnecessary v4mapped check entirely.
Audit all places that call addr_to_user and verify they are on a sycall
return path.
Add a custom getname that formats the address properly.
Several bugs are addressed:
- SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
addresses to user space
- The addr_len returned from recvmsg was not correct when
returning AF_INET on a v6 socket
- flowlabel and scope_id were not zerod when promoting
a v4 to v6
- Some syscalls like bind and connect behaved differently
depending on v4mapped
Tested bind, getpeername, getsockname, connect, and recvmsg for proper
behaviour in v4mapped = 1 and 0 cases.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karoly Kemeny [Wed, 30 Jul 2014 18:27:36 +0000 (20:27 +0200)]
net: kernel-doc compliant documentation for net_device
Net_device is a vast and important structure, but it has no kernel-doc
compliant documentation. This patch extracts the comments from the structure
to clean it up, and let the scripts extract documentation from it. I know that
the patch is big, but it's just reordering of comments into the appropriate
form, and adding a few more, for the missing members.
Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Jul 2014 21:13:38 +0000 (14:13 -0700)]
Merge branch 'stmmac-next'
Vince Bridgers says:
====================
net: stmmac: Improve mcast/ucast filter for snps
This patch series adds Synopsys specific bindings for the Synopsys EMAC
filter characteristics since those are implementation dependent. The
multicast and unicast filtering code was improved to handle different
configuration variations based on device tree settings.
I verified the operation of the multicast and unicast filters through
Synopsys support as requested during the V1 review, and tested the GMAC
configuration on an Altera Cyclone 5 SOC (which supports 256 multicast
bins and 128 Unicast addresses). The 10/100 variant of this driver
modification was not tested, although it was compile tested. I shared
the email thread results of the investigation through Synopsys with the
stmmac maintainer.
V4: Remove patch from series that addressed a sparse issue from a
down rev'd version of sparse that does not show up in the
latest version of sparse.
V3: Break up the patch into interface and functional change patches
per review comments
V2: Confirm with Synopsys methods to determine number of Multicast bins
and Unicast address filter entries per first round review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 31 Jul 2014 20:49:17 +0000 (15:49 -0500)]
net: stmmac: Support devicetree configs for mcast and ucast filter entries
This patch adds and modifies code to support multiple Multicast and Unicast
Synopsys MAC filter configurations. The default configuration is defined to
support legacy driver behavior, which is 64 Multicast bins. The Unicast
filter code previously assumed all controllers support 32 or 16 Unicast
addresses based on controller version number, but this has been corrected
to support a default of 1 Unicast address. The filter configuration may
be specified through the devicetree using a Synopsys specific device tree
entry. This information was verified with Synopsys through
Synopsys Support Case #
8000684337 and shared with the maintainer.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 31 Jul 2014 20:49:16 +0000 (15:49 -0500)]
ARM: socfpga: Add socfpga Ethernet filter attributes entries
This patch adds socfpga Ethernet filter attributes for multicast
and unicast filters per Synopsys Ethernet IP configuration chosen
by Altera for the Cyclone 5 and Arria SOC FPGAs.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 31 Jul 2014 20:49:15 +0000 (15:49 -0500)]
dts: Add bindings for multicast hash bins and perfect filter entries
This change adds bindings for the number of multicast hash bins and perfect
filter entries supported by the Synopsys EMAC. The Synopsys EMAC core is
configurable at device creation time, and can be configured for a different
number of multicast hash bins and a different number of perfect filter
entries. The device does not provide a way to query these parameters,
therefore parameters are required. The Altera Cyclone V SOC has support for
256 multicast hash bins and 128 perfect filter entries, and is different
than what's currently provided in the stmmac driver.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 31 Jul 2014 20:49:14 +0000 (15:49 -0500)]
net: stmmac: Correct set_filter for multicast and unicast cases
This patch removes the check for the number of mulitcast addresses
when using hash based filtering since it's not necessary. If the number
of multicast addresses in the list exceeds the number of multicast hash
bins, the bins will "fold" over into one of the bins configured and
enabled for the particular component instance.
The default number of maximum unicast addresses was changed from 32 to 1
since this number is not dependent on the component revision. The maximum
number of multicast and unicast addresses is dependent on the configuration
of the Synopsys EMAC configured by the SOC architect at the time the
features were selected and configured for a particular component. Sadly,
Synopsys does not provide a way to query the precise number supported
by a particular component, so we must fall back on a devicetree entry.
This configuration could vary from vendor to vendor (such as STMicro,
Altera, etc).
The multicast bins are set for every possible filtering case (including
no entries) - previously the bits were set only if multicast filter entries
were present.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 31 Jul 2014 20:49:13 +0000 (15:49 -0500)]
net: stmmac: Change MAC interface to support multiple filter configurations
The synopsys EMAC can be configured for different numbers of multicast hash
bins and perfect filter entries at device creation time and there's no way
to query this configuration information at runtime. As a result, a devicetree
parameter is required in order for the driver to program these filters
correctly for a particular device instance. This patch modifies the
10/100/1000 MAC software interface such that these configuration parameters
can be set at initialization time.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Jul 2014 21:09:14 +0000 (14:09 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains netfilter updates for net-next, they are:
1) Add the reject expression for the nf_tables bridge family, this
allows us to send explicit reject (TCP RST / ICMP dest unrech) to
the packets matching a rule.
2) Simplify and consolidate the nf_tables set dumping logic. This uses
netlink control->data to filter out depending on the request.
3) Perform garbage collection in xt_hashlimit using a workqueue instead
of a timer, which is problematic when many entries are in place in
the tables, from Eric Dumazet.
4) Remove leftover code from the removed ulog target support, from
Paul Bolle.
5) Dump unmodified flags in the netfilter packet accounting when resetting
counters, so userspace knows that a counter was in overquota situation,
from Alexey Perevalov.
6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from
Alexey.
7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST
attribute.
This patchset also includes a couple of cleanups for xt_LED from
Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from
Himangi Saraogi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Banerjee, Debabrata [Wed, 30 Jul 2014 17:50:17 +0000 (13:50 -0400)]
tcp: don't require root to read tcp_metrics
commit
d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced
netlink support for the new tcp_metrics, however it restricted getting of
tcp_metrics to root user only. This is a change from how these values could
have been fetched when in the old route cache. Unless there's a legitimate
reason to restrict the reading of these values it would be better if normal
users could fetch them.
Cc: Julian Anastasov <ja@ssi.bg>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 31 Jul 2014 19:48:59 +0000 (21:48 +0200)]
team: fix releasing uninitialized pointer to BPF prog
Commit
34c5bd66e5ed introduced the possibility that an
uninitialized pointer on the stack (orig_fp) can call into
sk_unattached_filter_destroy() when its value is non NULL.
Before that commit orig_fp was only destroyed in the same
block where it was assigned a valid BPF prog before. Fix it
up by initializing it to NULL.
Fixes:
34c5bd66e5ed ("net: filter: don't release unattached filter through call_rcu()")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Fri, 25 Jul 2014 11:15:36 +0000 (13:15 +0200)]
netfilter: nf_tables: check for unset NFTA_SET_ELEM_LIST_ELEMENTS attribute
Otherwise, the kernel oopses in nla_for_each_nested when iterating over
the unset attribute NFTA_SET_ELEM_LIST_ELEMENTS in the
nf_tables_{new,del}setelem() path.
netlink: 65524 bytes leftover after parsing attributes in process `nft'.
[...]
Oops: 0000 [#1] SMP
[...]
CPU: 2 PID: 6287 Comm: nft Not tainted 3.16.0-rc2+ #169
RIP: 0010:[<
ffffffffa0526e61>] [<
ffffffffa0526e61>] nf_tables_newsetelem+0x82/0xec [nf_tables]
[...]
Call Trace:
[<
ffffffffa05178c4>] nfnetlink_rcv+0x2e7/0x3d7 [nfnetlink]
[<
ffffffffa0517939>] ? nfnetlink_rcv+0x35c/0x3d7 [nfnetlink]
[<
ffffffff8137d300>] netlink_unicast+0xf8/0x17a
[<
ffffffff8137d6a5>] netlink_sendmsg+0x323/0x351
[...]
Fix this by returning -EINVAL if this attribute is not set, which
doesn't make sense at all since those commands are there to add and to
delete elements from the set.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Alexey Perevalov [Thu, 31 Jul 2014 13:14:05 +0000 (17:14 +0400)]
netfilter: nfnetlink_acct: avoid using NFACCT_F_OVERQUOTA with bit helper functions
Bit helper functions were used for manipulation with NFACCT_F_OVERQUOTA,
but they are accepting pit position, but not a bit mask. As a result
not a third bit for NFACCT_F_OVERQUOTA was set, but forth. Such
behaviour was dangarous and could lead to unexpected overquota report
result.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Thu, 31 Jul 2014 03:05:54 +0000 (20:05 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2014-07-30
This is the last pull request for ipsec-next before I'll be
off for two weeks starting on friday. David, can you please
take urgent ipsec patches directly into net/net-next during
this time?
1) Error handling simplifications for vti and vti6.
From Mathias Krause.
2) Remove a duplicate semicolon after a return statement.
From Christoph Paasch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Norris [Tue, 29 Jul 2014 21:34:14 +0000 (14:34 -0700)]
net: bcmgenet: correct spelling
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Jul 2014 03:00:27 +0000 (20:00 -0700)]
Merge branch 'libphy_mmd'
Vince Bridgers says:
====================
net: libphy: Add phy specific functions to access mmd regs
This set of patches addresses a problem found with the Micrel ksz9021 phy and
libphy, where the ksz9021 phy does not support mmd extended register access
per the IEEE specification as assumed by libphy. The first patch adds a
framework for phy specific support to specify their own function to access
extended phy registers, return a failure code if not supported, or to default
to libphy's IEEE defined method for accessing the mmd extended phy registers.
This issue was found by using the Synopsys EMAC and a Micrel ksz9021 phy on the
Altera Cyclone 5 SOC development kit. This patch was tested on the same system
in both positive and negative test cases.
V5: Revert name of mmd register access functions, check for phy specific
driver override functions in mmd register access functions per
Florian's comments to minimize source code changes
V4: Correct error when formatting V3 patch - erroneous text cut from code
V3: Correct formatting of function arguments, remove return statement from
NULL functions, and add patch for PHY driver documentation per review
comments.
V2: Split the original patch submission into seperate patches for the libphy
framework required for the modification and for the Micrel Phy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Tue, 29 Jul 2014 20:19:59 +0000 (15:19 -0500)]
Documentation: networking: phy.txt: Update text for indirect MMD access
Update the PHY library documentation to describe how a specific PHY
driver can use the PAL MMD register access routines or override those
routines with it's own in the event the PHY does not support the IEEE
standard for reading and writing MMD phy registers.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Tue, 29 Jul 2014 20:19:58 +0000 (15:19 -0500)]
net: libphy: Add stubs to hook IEEE MMD Register reads and writes
The Micrel ksz9021 PHY does not support standard IEEE standard MMD
extended register access, therefore requires stubs to fail the read
register method and do nothing for the write register method when
libphy attempts to read and/or configure Energy Efficient Ethernet
features in PHYS that do support those features. This problem
was observed on an Altera Cyclone V SOC development kit that
uses the Synopsys EMAC and the Micrel ksz9021 PHY. This patch
was tested on the same board, and Energy Efficient Ethernet is
now disabled as expected since the Micrel PHY does not support that
feature.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Tue, 29 Jul 2014 20:19:57 +0000 (15:19 -0500)]
net: libphy: Add phy specific function to access mmd phy registers
libphy was originally written assuming all phy devices support clause 45
access extensions to the mmd registers through the indirection registers
located within the first 16 phy registers. This assumption is not true
in all cases, and one specific example is the Micrel ksz9021 10/100/1000
Mbps phy. Using the stmmac driver, accessing the mmd registers to query
and configure energy efficient Ethernet (EEE) features yielded unexpected
behavior.
This patch adds mmd access functions to the phy driver that can be
overriden by the phy specific driver if the phy does not support this
mechanism or uses it's own non-standard access mechanism. By default,
the IEEE Compatible clause 45 access mechanism described in clause 22
is used. With this patch, EEE query/configure functions as expected
using the stmmac and the Micrel ksz9021 phy.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shruti Kanetkar [Tue, 29 Jul 2014 19:53:03 +0000 (14:53 -0500)]
net/fsl: Add format length modifier to avoid negative values
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 29 Jul 2014 19:47:25 +0000 (14:47 -0500)]
net/fsl: fix misspelled word
Fix one misspelled word reported by codespell.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira [Tue, 29 Jul 2014 15:36:28 +0000 (17:36 +0200)]
net: filter: don't release unattached filter through call_rcu()
sk_unattached_filter_destroy() does not always need to release the
filter object via rcu. Since this filter is never attached to the
socket, the caller should be responsible for releasing the filter
in a safe way, which may not necessarily imply rcu.
This is a short summary of clients of this function:
1) xt_bpf.c and cls_bpf.c use the bpf matchers from rules, these rules
are removed from the packet path before the filter is released. Thus,
the framework makes sure the filter is safely removed.
2) In the ppp driver, the ppp_lock ensures serialization between the
xmit and filter attachment/detachment path. This doesn't use rcu
so deferred release via rcu makes no sense.
3) In the isdn/ppp driver, it is called from isdn_ppp_release()
the isdn_ppp_ioctl(). This driver uses mutex and spinlocks, no rcu.
Thus, deferred rcu makes no sense to me either, the deferred releases
may be just masking the effects of wrong locking strategy, which
should be fixed in the driver itself.
4) In the team driver, this is the only place where the rcu
synchronization with unattached filter is used. Therefore, this
patch introduces synchronize_rcu() which is called from the
genetlink path to make sure the filter doesn't go away while packets
are still walking over it. I think we can revisit this once struct
bpf_prog (that only wraps specific bpf code bits) is in place, then
add some specific struct rcu_head in the scope of the team driver if
Jiri thinks this is needed.
Deferred rcu release for unattached filters was originally introduced
in
302d663 ("filter: Allow to create sk-unattached filters").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira [Tue, 29 Jul 2014 16:12:15 +0000 (18:12 +0200)]
netfilter: xt_bpf: add mising opaque struct sk_filter definition
This structure is not exposed to userspace, so fix this by defining
struct sk_filter; so we skip the casting in kernelspace. This is safe
since userspace has no way to lurk with that internal pointer.
Fixes:
e6f30c7 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Tue, 29 Jul 2014 15:16:44 +0000 (17:16 +0200)]
CAPI: use correct structure type name in sizeof
Correct typo in the name of the type given to sizeof. Because it is the
size of a pointer that is wanted, the typo has no impact on compilation or
execution.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/). The
semantic patch used can be found in message 0 of this patch series.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Jul 2014 01:47:51 +0000 (18:47 -0700)]
Merge branch 'amd-xgbe-next'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver update 2014-07-25
This patch series is dependent on the following patch that was
applied to the net tree and needs to be applied to the net-next
tree:
332cfc823d18 - amd-xgbe: Fix error return code in xgbe_probe()
The following series of patches includes fixes and new support in the
driver.
- Device bindings documentation update
- Hardware timestamp support
- 2.5GbE support changes
- Fifo sizes based on active queues/rings
- Phylib driver updates for:
- Rate change completion check
- KR training initiation
- Auto-negotiation results
- Traffic class support, including DCB support
This patch series is based on net-next.
Changes in V2:
- Remove DBGPR(...., __func__) calls
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:55 +0000 (08:57 -0500)]
amd-xgbe: Add traffic class support
This patch adds support for traffic classes as well as support
for Data Center Bridging interfaces related to traffic classes
and priority flow control.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:49 +0000 (08:57 -0500)]
amd-xgbe-phy: Print out the auto-negotiation method used
Add a netdev_info statement detailing whether auto-negotiation was
completed through parallel detection or through the auto-negotiation
protocol.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:43 +0000 (08:57 -0500)]
amd-xgbe-phy: Updates to KR training initiation
As part of changing rates to KR mode, KR training is initiated. If
the KR training is restarted it is possible to enter an invalid logic
state. This can be avoided by asserting a training reset bit before
initiating the KR training and then clearing the training reset bit.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:37 +0000 (08:57 -0500)]
amd-xgbe-phy: Updates to rate change complete check
Currently, the logic will loop endlessly waiting for a rate change
to complete. Add a counter so that if the rate change signals
never indicate complete the loop will eventually exit.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:31 +0000 (08:57 -0500)]
amd-xgbe: Base queue fifo size and enablement on ring count
When setting the fifo sizes for the queues and enabling the queues
use the number of active Tx and Rx queues that have been enabled
not the maximum number available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:25 +0000 (08:57 -0500)]
amd-xgbe: Update/fix 2.5GbE support
Update the amd-xgbe driver and phylib driver to better support
the 2.5GbE mode for the hardware. In order to be able establish
2.5GbE using clause 73 auto negotiation the device will support
speed sets of 1GbE/10GbE and 2.5GbE/10GbE.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:19 +0000 (08:57 -0500)]
amd-xgbe: Add hardware timestamp support
This patch adds support for Tx and Rx hardware timestamping.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 29 Jul 2014 13:57:14 +0000 (08:57 -0500)]
amd-xgbe: Add dma-coherent to device bindings documentation
An earlier patch added support for the "dma-coherent" device property.
This patch adds this optional property to the amd-xgbe device bindings
documentation.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Wed, 30 Jul 2014 00:31:08 +0000 (02:31 +0200)]
net: Remove unlikely() for WARN_ON() conditions
No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Tue, 29 Jul 2014 03:57:07 +0000 (20:57 -0700)]
dcbnl : Fix misleading dcb_app->priority explanation
Current explanation of dcb_app->priority is wrong. It says priority is
expected to be a 3-bit unsigned integer which is only true when working with
DCBx-IEEE. Use of dcb_app->priority by DCBx-CEE expects it to be 802.1p user
priority bitmap. Updated accordingly
This affects the cxgb4 driver, but I will post those changes as part of a
larger changeset shortly.
Fixes:
3e29027af4372 ("dcbnl: add support for ieee8021Qaz attributes")
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Mon, 28 Jul 2014 19:07:58 +0000 (14:07 -0500)]
net: stmmac: add platform init/exit for Altera's ARM socfpga
This patch adds platform init/exit functions and modifications to support
suspend/resume for the Altera Cyclone 5 SOC Ethernet controller. The platform
exit function puts the controller into reset using the socfpga reset
controller driver. The platform init function sets up the Synopsys mac by
first making sure the Ethernet controller is held in reset, programming the
phy mode through external support logic, then deasserts reset through
the socfpga reset manager driver.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Jul 2014 21:00:11 +0000 (14:00 -0700)]
Merge branch 'mlx5-next'
Eli Cohen says:
====================
mlx5 driver changes related to PCI handling ***
The first of these patches is changing the pci device driver from mlx5_ib to
mlx5_core in a similar manner it is done in mlx4. This set the grounds for us
to introduce Ethernet driver for HW which uses mlx5.
The other two patches contain minor fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 28 Jul 2014 20:30:24 +0000 (23:30 +0300)]
mlx5: Adjust events to use unsigned long param instead of void *
In the event flow, we currently pass only a port number in the
void *data argument. Rather than pass a pointer to the event handlers,
we should use an "unsigned long" parameter, and pass the port number
value directly.
In the future, if necessary for some events, we can use the unsigned long
parameter to pass a pointer.
Based on a patch by Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 28 Jul 2014 20:30:23 +0000 (23:30 +0300)]
mlx5: minor fixes (mainly avoidance of hidden casts)
There were many places where parameters which should be u8/u16 were
integer type.
Additionally, in 2 places, a check for a non-null pointer was added
before dereferencing the pointer (this is actually a bug fix).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Mon, 28 Jul 2014 20:30:22 +0000 (23:30 +0300)]
mlx5: Move pci device handling from mlx5_ib to mlx5_core
In preparation for a new mlx5 device which is VPI (i.e., ports can be
either IB or ETH), move the pci device functionality from mlx5_ib
to mlx5_core.
This involves the following changes:
1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev
is now an independent structure maintained by mlx5_core.
mlx5_ib_dev now has a pointer to that struct.
This requires changing a lot of places where the core_dev
struct was accessed via mlx5_ib_dev (now, this needs to
be a pointer dereference).
2. All PCI initializations are now done in mlx5_core. Thus,
it is now mlx5_core which does pci_register_device (and not
mlx5_ib, as was previously).
3. mlx5_ib now registers itself with mlx5_core as an "interface"
driver. This is very similar to the mechanism employed for
the mlx4 (ConnectX) driver. Once the HCA is initialized
(by mlx5_core), it invokes the interface drivers to do
their initializations.
4. There is a new event handler which the core registers:
mlx5_core_event(). This event handler invokes the
event handlers registered by the interfaces.
Based on a patch by Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 28 Jul 2014 12:01:38 +0000 (14:01 +0200)]
random32: mix in entropy from core to late initcall
Currently, we have a 3-stage seeding process in prandom():
Phase 1 is from the early actual initialization of prandom()
subsystem which happens during core_initcall() and remains
most likely until the beginning of late_initcall() phase.
Here, the system might not have enough entropy available
for seeding with strong randomness from the random driver.
That means, we currently have a 32bit weak LCG() seeding
the PRNG status register 1 and mixing that successively
into the other 3 registers just to get it up and running.
Phase 2 starts with late_initcall() phase resp. when the
random driver has initialized its non-blocking pool with
enough entropy. At that time, we throw away *all* inner
state from its 4 registers and do a full reseed with strong
randomness.
Phase 3 starts right after that and does a periodic reseed
with random slack of status register 1 by a strong random
source again.
A problem in phase 1 is that during bootup data structures
can be initialized, e.g. on module load time, and thus access
a weakly seeded prandom and are never changed for the rest
of their live-time, thus carrying along the results from a
week seed. Lets make sure that current but also future users
access a possibly better early seeded prandom.
This patch therefore improves phase 1 by trying to make it
more 'unpredictable' through mixing in seed from a possible
hardware source. Now, the mix-in xors inner state with the
outcome of either of the two functions arch_get_random_{,seed}_int(),
preferably arch_get_random_seed_int() as it likely represents
a non-deterministic random bit generator in hw rather than
a cryptographically secure PRNG in hw. However, not all might
have the first one, so we use the PRNG as a fallback if
available. As we xor the seed into the current state, the
worst case would be that a hardware source could be unverifiable
compromised or backdoored. In that case nevertheless it
would be as good as our original early seeding function
prandom_seed_very_weak() since we mix through xor which is
entropy preserving.
Joint work with Daniel Borkmann.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Jul 2014 20:25:49 +0000 (13:25 -0700)]
Merge git://git./linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Perevalov [Wed, 30 Jul 2014 15:17:55 +0000 (19:17 +0400)]
netfilter: nfnetlink_acct: dump unmodified nfacct flags
NFNL_MSG_ACCT_GET_CTRZERO modifies dumped flags, in this case
client see unmodified (uncleared) counter value and cleared
overquota state - end user doesn't know anything about overquota state,
unless end user subscribed on overquota report.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Wed, 30 Jul 2014 16:01:04 +0000 (09:01 -0700)]
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull Exynos platform DT fix from Grant Likely:
"Device tree Exynos bug fix for v3.16-rc7
This bug fix has been brewing for a while. I hate sending it to you
so late, but I only got confirmation that it solves the problem this
past weekend. The diff looks big for a bug fix, but the majority of
it is only executed in the Exynos quirk case. Unfortunately it
required splitting early_init_dt_scan() in two and adding quirk
handling in the middle of it on ARM.
Exynos has buggy firmware that puts bad data into the memory node.
Commit
1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by
dropping the artificial upper bound on the number of memory banks that
can be added. Exynos fails to boot after that commit. This branch
fixes it by splitting the early DT parse function and inserting a
fixup hook. Exynos uses the hook to correct the DT before parsing
memory regions"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
arm: Add devicetree fixup machine function
of: Add memory limiting function for flattened devicetrees
of: Split early_init_dt_scan into two parts
Linus Torvalds [Wed, 30 Jul 2014 16:00:20 +0000 (09:00 -0700)]
Merge tag 'stable/for-linus-3.16-rc7-tag' of git://git./linux/kernel/git/xen/tip
Pull Xen fix from David Vrabel:
"Fix BUG when trying to expand the grant table. This seems to occur
often during boot with Ubuntu 14.04 PV guests"
* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: safely map and unmap grant frames when in atomic context
Linus Torvalds [Wed, 30 Jul 2014 15:59:15 +0000 (08:59 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fix from Paolo Bonzini:
"Fix a bug which allows KVM guests to bring down the entire system on
some 64K enabled ARM64 hosts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
Linus Torvalds [Wed, 30 Jul 2014 15:56:23 +0000 (08:56 -0700)]
Revert "cdc_subset: deal with a device that needs reset for timeout"
This reverts commit
20fbe3ae990fd54fc7d1f889d61958bc8b38f254.
As reported by Stephen Rothwell, it causes compile failures in certain
configurations:
drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function)
.pre_reset = dummy_prereset,
^
drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function)
.post_reset = dummy_postreset,
^
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David Miller <davem@davemloft.net>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 30 Jul 2014 15:54:17 +0000 (08:54 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Make fragmentation IDs less predictable, from Eric Dumazet.
2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov.
3) Don't allow NULL msg->msg_name just because msg->msg_namelen is
non-zero, from Andrey Ryabinin.
4) ndm->ndm_type set using wrong macros, from Jun Zhao.
5) cdc-ether devices can come up with entries in their address filter,
so explicitly clear the filter after the device initializes. From
Oliver Neukum.
6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert.
7) Short packets not padded properly, exposing random data, in bcmgenet
driver. Fix from Florian Fainelli.
8) xgbe_probe() doesn't return an error code, but rather zero, when
netif_set_real_num_tx_queues() fails. Fix from Wei Yongjun.
9) USB speed not probed properly in r8152 driver, from Hayes Wang.
10) Transmit logic choosing the outgoing port in the sunvnet driver
needs to consider a) is the port actually up and b) whether it is a
switch port. Fix from David L Stevens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
net: phy: re-apply PHY fixups during phy_register_device
cdc-ether: clean packet filter upon probe
cdc_subset: deal with a device that needs reset for timeout
net: sendmsg: fix NULL pointer dereference
isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
ip: make IP identifiers less predictable
neighbour : fix ndm_type type error issue
sunvnet: only use connected ports when sending
can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
bnx2x: fix crash during TSO tunneling
r8152: fix the checking of the usb speed
net: phy: Ensure the MDIO bus module is held
net: phy: Set the driver when registering an MDIO bus device
bnx2x: fix set_setting for some PHYs
hyperv: Fix error return code in netvsc_init_buf()
amd-xgbe: Fix error return code in xgbe_probe()
ath9k: fix aggregation session lockup
net: bcmgenet: correctly pad short packets
net: sctp: inherit auth_capable on INIT collisions
mac80211: fix crash on getting sta info with uninitialized rate control
...
David Vrabel [Fri, 11 Jul 2014 15:42:34 +0000 (16:42 +0100)]
x86/xen: safely map and unmap grant frames when in atomic context
arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.
Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.
These two functions are only used in PV guests.
Introduce arch_gnttab_init() to allocates the virtual address space in
advance.
Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Will Deacon [Fri, 25 Jul 2014 15:29:12 +0000 (16:29 +0100)]
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.
As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.
SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.
This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joel Schopp <joel.schopp@amd.com>
Cc: Don Dutile <ddutile@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:36 +0000 (10:03 -0700)]
arm: Add devicetree fixup machine function
Commit
1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas FĂ€rber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name]
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:35 +0000 (10:03 -0700)]
of: Add memory limiting function for flattened devicetrees
Buggy bootloaders may pass bogus memory entries in the devicetree.
Add of_fdt_limit_memory to add an upper bound on the number of
entries that can be present in the devicetree.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas FĂ€rber <afaerber@suse.de>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:34 +0000 (10:03 -0700)]
of: Split early_init_dt_scan into two parts
Currently, early_init_dt_scan validates the header, sets the
boot params, and scans for chosen/memory all in one function.
Split this up into two separate functions (validation/setting
boot params in one, scanning in another) to allow for
additional setup between boot params and scanning the memory.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas FĂ€rber <afaerber@suse.de>
[glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/]
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Thomas Petazzoni [Sun, 27 Jul 2014 21:21:36 +0000 (23:21 +0200)]
net: mvpp2: implement ioctl() operation for PHY ioctls
This commit implements the ->ndo_do_ioctl() operation so that the
PHY-related ioctl() calls can work from userspace, which allows
applications like mii-tool or mii-diag to do their job.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Sun, 27 Jul 2014 21:21:35 +0000 (23:21 +0200)]
net: mvpp2: fix 10 Mbit/s usage
This commit is similar to commit
4d12bc63ab5e ("net: mvneta: fix
operation in 10 Mbit/s mode"), but this time for the mvpp2 driver. The
driver was properly taking into account the 1 Gbit/s and 100 Mbit/s
speeds, but not the 10 Mbit/s, which was handled as 100
Mbit/s. However, the MVPP2_GMAC_CONFIG_MII_SPEED bit in the
MVPP2_GMAC_AUTONEG_CONFIG register must remain cleared to allow 10
Mbit/s operation. This commit therefore fixes 10 Mbit/s operation.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karoly Kemeny [Sun, 27 Jul 2014 10:29:07 +0000 (12:29 +0200)]
ipv4: clean up cast warning in do_ip_getsockopt
Sparse warns because of implicit pointer cast.
v2: subject line correction, space between "void" and "*"
Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 28 Jul 2014 13:30:14 +0000 (21:30 +0800)]
tipc: remove duplicated include from socket.c
Remove duplicated include.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Sun, 27 Jul 2014 07:08:38 +0000 (12:38 +0530)]
net/udp_offload: Use IS_ERR_OR_NULL
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.
The following Coccinelle semantic patch was used for making the change:
@@
expression e;
@@
- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Sun, 27 Jul 2014 07:07:46 +0000 (12:37 +0530)]
openvswitch: Use IS_ERR_OR_NULL
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.
The following Coccinelle semantic patch was used for making the change:
@@
expression e;
@@
- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Sun, 27 Jul 2014 07:06:51 +0000 (12:36 +0530)]
net/ipv4: Use IS_ERR_OR_NULL
This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.
The following Coccinelle semantic patch was used for making the change:
@@
expression e;
@@
- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 27 Jul 2014 02:14:39 +0000 (03:14 +0100)]
sfc: Use __iowrite64_copy instead of a slightly different local function
__iowrite64_copy() isn't quite the same as efx_memcpy_64(), but
it looks close enough:
- The length is in units of qwords not bytes
- It never byte-swaps, but that doesn't make a difference now as PIO
is only enabled for x86_64
- It doesn't include any memory barriers, but that's OK as there is a
barrier just before pushing the doorbell
- mlx4_en uses it for the same purpose
Compile-tested only.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 28 Jul 2014 23:28:07 +0000 (16:28 -0700)]
net: phy: re-apply PHY fixups during phy_register_device
Commit
87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.
By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.
Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.
This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe
time.
Reported-by: Hauke Merthens <hauke-m@hauke-m.de>
Reported-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 28 Jul 2014 08:56:36 +0000 (10:56 +0200)]
cdc-ether: clean packet filter upon probe
There are devices that don't do reset all the way. So the packet filter should
be set to a sane initial value. Failure to do so leads to intermittent failures
of DHCP on some systems under some conditions.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 28 Jul 2014 08:12:34 +0000 (10:12 +0200)]
cdc_subset: deal with a device that needs reset for timeout
This device needs to be reset to recover from a timeout.
Unfortunately this can be handled only at the level of
the subdrivers.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Ryabinin [Sat, 26 Jul 2014 17:26:58 +0000 (21:26 +0400)]
net: sendmsg: fix NULL pointer dereference
Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
>
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted
3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823]
ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233]
ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552]
0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================
This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
This bug was introduced in
f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.
This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Khoroshilov [Fri, 25 Jul 2014 22:34:31 +0000 (02:34 +0400)]
isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
There is a lack of usb_put_dev(udev) on failure path in gigaset_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jul 2014 18:43:58 +0000 (11:43 -0700)]
Merge branch 'netdev-name'
Cong Wang says:
====================
net: forbid net devices named "all" "default" or "config"
/proc/sys/net/ipv[46]/conf/<dev> could conflict with
/proc/sys/net/ipv[46]/conf/(all|default). And /proc/net/vlan/<dev>
could conflict with /proc/net/vlan/config. Besides kernel warnings,
undefined behavior such as duplicated proc files also appears, therefore
we should forbid these names.
v2: introduce a helper function, suggested by Florian
fix error handling for ipv6_add_dev() in addrconf_init()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 25 Jul 2014 22:25:10 +0000 (15:25 -0700)]
vlan: fail early when creating netdev named config
Similarly, vlan will create /proc/net/vlan/<dev>, so when we
create dev with name "config", it will confict with
/proc/net/vlan/config.
Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 25 Jul 2014 22:25:09 +0000 (15:25 -0700)]
ipv6: fail early when creating netdev named all or default
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".
Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.
Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 25 Jul 2014 22:25:08 +0000 (15:25 -0700)]
ipv4: fail early when creating netdev named all or default
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".
Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.
Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jul 2014 18:40:07 +0000 (11:40 -0700)]
Merge branch 'syststamp-removal'
Willem de Bruijn says:
====================
net: remove deprecated syststamp
The network stack can generate two kinds of hardware timestamps:
- hwtstamp stores a hw timestamp in device-specific raw format
- syststamp convers the raw format to system time
The second is deprecated and only implemented by a single device
driver. The suggested alternative is to communicate hwtstamp +
directly expose the NIC PTP clock device through ptp_clock_info.
The remaining driver (octeon) does not expose such a standard
interface as of now. It does have its own PTP library that depends
on its own shared memory PTP clock interface.
This patchset
1. reverts the syststamp code in the one driver (octeon)
2. reverts an unnecessary zero initialization in another (vxge)
3. modifies PF_PACKET to use syststamp is != 0 (because always == 0)
4. modifies SCM_TIMESTAMPING in the same way
For backwards compatibility, the interfaces are not removed.
Applications can still request SOF_TIMESTAMPING_SYS_HARDWARE. The
response field in scm_timestamping also remains. As was the case
for hardware/drivers that did not implement the feature, the
setsockopt succeeds, but the response field is always zero.
====================
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 25 Jul 2014 22:01:32 +0000 (18:01 -0400)]
net: remove deprecated syststamp timestamp
The SO_TIMESTAMPING API defines three types of timestamps: software,
hardware in raw format (hwtstamp) and hardware converted to system
format (syststamp). The last has been deprecated in favor of combining
hwtstamp with a PTP clock driver. There are no active users in the
kernel.
The option was device driver dependent. If set, but without hardware
support, the correct behavior is to return zero in the relevant field
in the SCM_TIMESTAMPING ancillary message. Without device drivers
implementing the option, this field is effectively always zero.
Remove the internal plumbing to dissuage new drivers from implementing
the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
avoid breaking existing applications that request the timestamp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 25 Jul 2014 22:01:31 +0000 (18:01 -0400)]
packet: remove deprecated syststamp timestamp
No device driver will ever return an skb_shared_info structure with
syststamp non-zero, so remove the branch that tests for this and
optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE.
Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes
may refer to it.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 25 Jul 2014 22:01:30 +0000 (18:01 -0400)]
vxge: remove deprecated syststamp timestamp
This driver explicitly clears a field that is unused and about to be
removed. Remove the initialization.
All fields in skb_shared_info before dataref are cleared in
__alloc_skb, so the removal is safe even while syststamp exists.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 25 Jul 2014 22:01:29 +0000 (18:01 -0400)]
octeon: remove deprecated syststamp timestamp
Hardware timestamps can be exposed to userspace in raw hardware format
(hwtstamp) as well as converted to system time (syststamp). The second
variant is deprecated and only implemented by this driver.
The preferred method of hardware timestamp generation is to combine
hwtstamp with a device PTP clock. Octeon has its own PTP library
that relies on a shared memory interface to the PTP clock device.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 29 Jul 2014 17:28:38 +0000 (10:28 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A nice small set of bug fixes for arm-soc:
- two incorrect register addresses in DT files on shmobile and hisilicon
- one revert for a regression on omap
- one bug fix for a newly introduced pin controller binding
- one regression fix for the memory controller on omap
- one patch to avoid a harmless WARN_ON"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: Revert enabling of twl configuration for n900
ARM: dts: fix L2 address in Hi3620
ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable()
pinctrl: dra: dt-bindings: Fix pull enable/disable
ARM: shmobile: r8a7791: Fix SD2CKCR register address
ARM: OMAP2+: l2c: squelch warning dump on power control setting
David Howells [Tue, 29 Jul 2014 16:53:23 +0000 (17:53 +0100)]
AFS: Correctly assemble the client UUID
Correctly assemble the client UUID by OR'ing in the flags rather than
assigning them over the other components.
Reported-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sun, 27 Jul 2014 21:15:33 +0000 (14:15 -0700)]
mm: fix page_alloc.c kernel-doc warnings
Fix kernel-doc warnings and function name in mm/page_alloc.c:
Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 29 Jul 2014 11:04:27 +0000 (13:04 +0200)]
Merge tag 'omap-for-v3.16/n900-regression' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
Merge "omap n900 regression fix for v3.16 rc series" from Tony Lindgren:
Minimal regression fix for n900 display that got broken with
enabling of twl4030 PM features. Turns out more work is needed
before we can enable twl4030 PM on n900.
I did not notice this earlier as I have my n900 in a rack
and the display did not get enabled for device tree based booting
until for v3.16.
* tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Revert enabling of twl configuration for n900
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tony Lindgren [Fri, 25 Jul 2014 11:41:25 +0000 (04:41 -0700)]
ARM: dts: Revert enabling of twl configuration for n900
Commit
9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration
for selected omaps) allowed n900 to cut off core voltages during
off-idle. This however caused a regression where twl regulator
vaux1 was not getting enabled for the LCD panel as we are not
requesting it for the panel.
Turns out quite a few devices on n900 are using vaux1, and we need
to either stop idling it, or add proper regulator_get calls for all
users. But until we have a proper solution implemented and tested,
let's just disable the twl off-idle configuration for now for n900.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes:
9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps)
Signed-off-by: Tony Lindgren <tony@atomide.com>
Eric Dumazet [Sat, 26 Jul 2014 06:58:10 +0000 (08:58 +0200)]
ip: make IP identifiers less predictable
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.
With commit
73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.
This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.
Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.
This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.
We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.
For IPv6, adds saddr into the hash as well, but not nexthdr.
If I ping the patched target, we can see ID are now hard to predict.
21:57:11.008086 IP (...)
A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
target > A: ICMP echo reply, seq 1, length 64
21:57:12.013133 IP (...)
A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
target > A: ICMP echo reply, seq 2, length 64
21:57:13.016580 IP (...)
A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
target > A: ICMP echo reply, seq 3, length 64
[1] TCP sessions uses a per flow ID generator not changed by this patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 25 Jul 2014 18:48:09 +0000 (14:48 -0400)]
tipc: make tipc_buf_append() more robust
As per comment from David Miller, we try to make the buffer reassembly
function more resilient to user errors than it is today.
- We check that the "*buf" parameter always is set, since this is
mandatory input.
- We ensure that *buf->next always is set to NULL before linking in
the buffer, instead of relying of the caller to have done this.
- We ensure that the "tail" pointer in the head buffer's control
block is initialized to NULL when the first fragment arrives.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jun Zhao [Fri, 25 Jul 2014 16:38:59 +0000 (00:38 +0800)]
neighbour : fix ndm_type type error issue
ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST.
NDA_DST is for netlink TLV type, hence it's not right value in this context.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jul 2014 00:36:25 +0000 (17:36 -0700)]
Merge tag 'master-2014-07-25' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-07-25
Please pull this batch of updates intended for the 3.17 stream!
For the mac80211 bits, Johannes says:
"We have a lot of TDLS patches, among them a fix that should make hwsim
tests happy again. The rest, this time, is mostly small fixes."
For the Bluetooth bits, Gustavo says:
"Some more patches for 3.17. The most important change here is the move of
the 6lowpan code to net/6lowpan. It has been agreed with Davem that this
change will go through the bluetooth tree. The rest are mostly clean up and
fixes."
and,
"Here follows some more patches for 3.17. These are mostly fixes to what
we've sent to you before for next merge window."
For the iwlwifi bits, Emmanuel says:
"I have the usual amount of BT Coex stuff. Arik continues to work
on TDLS and Ariej contributes a few things for HS2.0. I added a few
more things to the firmware debugging infrastructure. Eran fixes a
small bug - pretty normal content."
And for the Atheros bits, Kalle says:
"For ath6kl me and Jessica added support for ar6004 hw3.0, our latest
version of ar6004.
For ath10k Janusz added a printout so that it's easier to check what
ath10k kconfig options are enabled. He also added a debugfs file to
configure maximum amsdu and ampdu values. Also we had few fixes as
usual."
On top of that is the usual large batch of various driver updates --
brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action.
RafaĆ has also been very busy with b43 and related updates.
Also, I pulled the wireless tree into this in order to resolve a
merge conflict...
P.S. The change to fs/compat_ioctl.c reflects a name change in a
Bluetooth header file...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David L Stevens [Fri, 25 Jul 2014 14:30:11 +0000 (10:30 -0400)]
sunvnet: only use connected ports when sending
The sunvnet driver doesn't check whether or not a port is connected when
transmitting packets, which results in failures if a port fails to connect
(e.g., due to a version mismatch). The original code also assumes
unnecessarily that the first port is up and a switch, even though there is
a flag for switch ports.
This patch only matches a port if it is connected, and otherwise uses the
switch_port flag to send the packet to a switch port that is up.
Signed-off-by: David L Stevens <david.stevens@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 25 Jul 2014 12:21:21 +0000 (15:21 +0300)]
bonding: fix a memory leak in bond_arp_send_all()
This test is reversed so the memory is always leaked. It's better style
to remove the test anyway.
Fixes:
3e403a77779f ('bonding: make it possible to have unlimited nested upper vlans')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 25 Jul 2014 09:18:17 +0000 (02:18 -0700)]
netlink: Fix shadow warning on jiffies
Change formal parameter name to not shadow the global jiffies.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jul 2014 00:01:01 +0000 (17:01 -0700)]
Merge tag 'linux-can-fixes-for-3.16-
20140725' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2014-07-25
this is a pull request of one patch for the net tree, hoping to get into the
3.16 release.
The patch by George Cherian fixes a regression in the c_can platform driver.
When using two interfaces the regression leads to a non function second
interface.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jul 2014 19:19:11 +0000 (12:19 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-07-25
This series contains updates to e1000e, ixgbe and ixgbevf.
Mark provides all the changes for ixgbe and ixgbevf. Converts some udelay()
calls to the preferred usleep_range(). Fixes a spurious release of the
semaphore in several functions when there was a failure to acquire the
semaphore in the first place. Fixes a X540 semaphore error where an
incorrect check was treating success as failure and vice-versa. Fixed
ixgbe_write_mbx() error when it was being called and there was no
mbx->ops.write method defined, so no error code was returned. The
corresponding read function would explicitly return an error in such a
case as do other functions. Cleans up unused (dead) code by removing it.
Finally make return values more direct, eliminating some gotos and
otherwise unneeded conditionals, which allows the removal of some local
variables.
David provides all the changes for e1000e. Fix CRC errors with jumbo
traffic for 82579, i217 and i218 client parts to increase the gap
between the read and write pointers in the transmit FIFO. Added code
to check and respond to previously ignored return values from NVM
access functions. Added support for EEE in Sx states and fixed EEE in
S5 with runtime PM enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 28 Jul 2014 18:35:30 +0000 (11:35 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull ARM AES crypto fixes from Herbert Xu:
"This push fixes a regression on ARM where odd-sized blocks supplied to
AES may cause crashes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm-aes - fix encryption of unaligned data
crypto: arm64-aes - fix encryption of unaligned data
Linus Torvalds [Mon, 28 Jul 2014 18:34:31 +0000 (11:34 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.
One is a recent regression (MMCR2 business), the other is a trivial
endian fix without which FW updates won't work on LE in IBM machines,
and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
LOT more friendly especially when the whole thing is about retrieving
error logs ..."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix endianness of flash_block_list in rtas_flash
powerpc/powernv: Change BUG_ON to WARN_ON in elog code
powerpc/perf: Fix MMCR2 handling for EBB
Mikulas Patocka [Fri, 25 Jul 2014 23:42:30 +0000 (19:42 -0400)]
crypto: arm-aes - fix encryption of unaligned data
Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.13+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mikulas Patocka [Fri, 25 Jul 2014 23:40:20 +0000 (19:40 -0400)]
crypto: arm64-aes - fix encryption of unaligned data
cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket.
See https://bugzilla.redhat.com/show_bug.cgi?id=
1122937
The bug is caused by incorrect handling of unaligned data in
arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned
on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the
socket to encrypt data in the buffer. The arm64 crypto accelerator causes
data corruption or crashes in the scatterwalk_pagedone.
This patch fixes the bug by passing the residue bytes that were not
processed as the last parameter to blkcipher_walk_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
David S. Miller [Mon, 28 Jul 2014 05:34:49 +0000 (22:34 -0700)]
Merge branch 'inet_frag_kill_lru_list'
Nikolay Aleksandrov says:
====================
inet: frag: cleanup and update
The end goal of this patchset is to remove the LRU list and to move the
frag eviction to a work queue. It also does a couple of necessary cleanups
and fixes. Brief patch descriptions:
Patches 1 - 3 inclusive: necessary clean ups
Patch 4 moves the eviction from the softirqs to a workqueue.
Patch 5 removes the nqueues counter which was protected by the LRU lock
Patch 6 removes the, by now unused, lru list.
Patch 7 moves the rebuild timer to the workqueue and schedules the rebuilds
only if we've hit the maximum queue length on some of the chains.
Patch 8 migrate the rwlock to a seqlock since the rehash is usually a rare
operation.
Patch 9 introduces an artificial global memory limit based on the value of
init_net's high_thresh which is used to cap the high_thresh of the
other namespaces. Also introduces some sane limits on the other
tunables, and makes it impossible to have low_thresh > high_thresh.
Here are some numbers from running netperf before and after the patchset:
Each test consists of the following setting: -I 95,5 -i 15,10
1. Bound test (-T 4,4)
1.1 Virtio before the patchset -
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf. : cpu bind
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
212992 64000 30.00 722177 0 12325.1 34.55 2.025
212992 30.00 368020 6280.9 34.05 0.752
1.2 Virtio after the patchset -
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf. : cpu bind
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
212992 64000 30.00 727030 0 12407.9 35.45 1.876
212992 30.00 505405 8625.5 34.92 0.693
2. Virtio unbound test
2.1 Before the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 64000 30.00 730008 0 12458.77
212992 30.00 416721 7112.02
2.2 After the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.177 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 64000 30.00 731129 0 12477.89
212992 30.00 487707 8323.50
3. 10 gig unbound tests
3.1 Before the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.133.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 64000 30.00 417209 0 7120.33
212992 30.00 416740 7112.33
3.2 After the patchset
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.133.1 () port 0 AF_INET : +/-2.500% @ 95% conf.
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 64000 30.00 438009 0 7475.33
212992 30.00 437630 7468.87
Given the options each netperf ran between 10 and 15 times for 30 seconds
to get the necessary confidence, also the tests themselves ran 3 times and
were consistent.
Another set of tests that I ran were parallel stress tests which consisted
of flooding the machine with fragmented packets from different sources with
frag timeout set to 0 (so there're lots of timeouts) and low_thresh set to
1 byte (so evictions are happening all the time) and on top of that running
a namespace create/destroy endless loop with network interfaces and
addresses that got flooded (for the brief periods they were up) in parallel.
This test ran for an hour without any issues.
====================
Nikolay Aleksandrov [Thu, 24 Jul 2014 14:50:37 +0000 (16:50 +0200)]
inet: frag: set limits and make init_net's high_thresh limit global
This patch makes init_net's high_thresh limit to be the maximum for all
namespaces, thus introducing a global memory limit threshold equal to the
sum of the individual high_thresh limits which are capped.
It also introduces some sane minimums for low_thresh as it shouldn't be
able to drop below 0 (or > high_thresh in the unsigned case), and
overall low_thresh should not ever be above high_thresh, so we make the
following relations for a namespace:
init_net:
high_thresh - max(not capped), min(init_net low_thresh)
low_thresh - max(init_net high_thresh), min (0)
all other namespaces:
high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
low_thresh = max(namespace's high_thresh), min(0)
The major issue with having low_thresh > high_thresh is that we'll
schedule eviction but never evict anything and thus rely only on the
timers.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 24 Jul 2014 14:50:36 +0000 (16:50 +0200)]
inet: frag: use seqlock for hash rebuild
rehash is rare operation, don't force readers to take
the read-side rwlock.
Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.
If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 24 Jul 2014 14:50:35 +0000 (16:50 +0200)]
inet: frag: remove periodic secret rebuild timer
merge functionality into the eviction workqueue.
Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.
If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.
ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 24 Jul 2014 14:50:34 +0000 (16:50 +0200)]
inet: frag: remove lru list
no longer used.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>