Eric Dumazet [Sat, 4 Oct 2014 03:59:19 +0000 (20:59 -0700)]
net: skb_segment() provides list head and tail
Its unfortunate we have to walk again skb list to find the tail
after segmentation, even if data is probably hot in cpu caches.
skb_segment() can store the tail of the list into segs->prev,
and validate_xmit_skb_list() can immediately get the tail.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 04:32:25 +0000 (00:32 -0400)]
Merge branch 'geneve'
Andy Zhou says:
====================
Add Geneve tunnel protocol support
This patch series adds kernel support for Geneve (Generic Network
Virtualization Encapsulation) based on Geneve IETF draft:
http://www.ietf.org/id/draft-gross-geneve-01.txt
Patch 1 implements Geneve tunneling protocol driver
Patch 2-6 adds openvswitch support for creating and using
Geneve tunnels by OVS user space.
v1->v2: Style fixes: use tab instead space for Kconfig
Patch 2-6 are reviewed by Pravin Shetty, add him to acked-by
Patch 6 was reviewed by Thomas Graf when commiting
to openvswitch.org, add him to acked-by.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 3 Oct 2014 22:35:33 +0000 (15:35 -0700)]
openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 3 Oct 2014 22:35:32 +0000 (15:35 -0700)]
openvswitch: Factor out allocation and verification of actions.
As the size of the flow key grows, it can put some pressure on the
stack. This is particularly true in ovs_flow_cmd_set(), which needs several
copies of the key on the stack. One of those uses is logically separate,
so this factors it out to reduce stack pressure and improve readibility.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 3 Oct 2014 22:35:31 +0000 (15:35 -0700)]
openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure.
Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.
This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 3 Oct 2014 22:35:30 +0000 (15:35 -0700)]
openvswitch: Add support for matching on OAM packets.
Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 3 Oct 2014 22:35:29 +0000 (15:35 -0700)]
openvswitch: Eliminate memset() from flow_extract.
As new protocols are added, the size of the flow key tends to
increase although few protocols care about all of the fields. In
order to optimize this for hashing and matching, OVS uses a variable
length portion of the key. However, when fields are extracted from
the packet we must still zero out the entire key.
This is no longer necessary now that OVS implements masking. Any
fields (or holes in the structure) which are not part of a given
protocol will be by definition not part of the mask and zeroed out
during lookup. Furthermore, since masking already uses variable
length keys this zeroing operation automatically benefits as well.
In principle, the only thing that needs to be done at this point
is remove the memset() at the beginning of flow. However, some
fields assume that they are initialized to zero, which now must be
done explicitly. In addition, in the event of an error we must also
zero out corresponding fields to signal that there is no valid data
present. These increase the total amount of code but very little of
it is executed in non-error situations.
Removing the memset() reduces the profile of ovs_flow_extract()
from 0.64% to 0.56% when tested with large packets on a 10G link.
Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou [Fri, 3 Oct 2014 22:35:28 +0000 (15:35 -0700)]
net: Add Geneve tunneling protocol driver
This adds a device level support for Geneve -- Generic Network
Virtualization Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-01
Only protocol layer Geneve support is provided by this driver.
Openvswitch can be used for configuring, set up and tear down
functional Geneve tunnels.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Li [Fri, 3 Oct 2014 21:29:14 +0000 (14:29 -0700)]
net: fec: fix build error at m68k platform
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout
1b7bde6d659d30f171259cc2dfba8e5dab34e735
make.cross ARCH=m68k m5275evb_defconfig
make.cross ARCH=m68k
All error/warnings:
drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_rx_queue':
>> drivers/net/ethernet/freescale/fec_main.c:1470:3: error: implicit declaration of function 'prefetch' [-Werror=implicit-function-declaration]
prefetch(skb->data - NET_IP_ALIGN);
^
cc1: some warnings being treated as errors
missed included prefetch.h
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Fri, 3 Oct 2014 19:25:01 +0000 (12:25 -0700)]
net: bcmgenet: improve bcmgenet_mii_setup()
bcmgenet_mii_setup() is called from the PHY state machine every 1-2 seconds
when the PHYs are in PHY_POLL mode.
Improve bcmgenet_mii_setup() so that it touches the MAC registers only when
the link is up and there was a change to link, speed, duplex, or pause status.
Signed-off-by: Petri Gynther <pgynther@google.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 01:39:58 +0000 (21:39 -0400)]
Merge branch 'altera_tse'
Walter Lozano says:
====================
Altera TSE with no PHY
In some scenarios there is no PHY chip present, for example in optical links.
This serie of patches moves PHY get addr and MDIO create to a new function and
avoids PHY and MDIO probing in these cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Walter Lozano [Fri, 3 Oct 2014 18:09:01 +0000 (15:09 -0300)]
Altera TSE: Add support for no PHY
This patch avoids PHY and MDIO probing if no PHY chip is present.
This is the case mainly in optical links where there is no need for
PHY chip, and therefore no need of MDIO. In this scenario Ethernet
MAC is directly connected to an optical module through an external
SFP transceiver.
Signed-off-by: Walter Lozano <walter@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
Walter Lozano [Fri, 3 Oct 2014 18:09:00 +0000 (15:09 -0300)]
Altera TSE: Move PHY get addr and MDIO create
Move PHY get addr and MDIO create to a new function to improve readability
and make it easier to avoid its usage. This will be useful for example in
the case where there is no PHY chip.
Signed-off-by: Walter Lozano <walter@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 01:34:39 +0000 (21:34 -0400)]
Merge tag 'master-2014-10-02' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-10-03
Please pull tihs batch of updates intended for the 3.18 stream!
For the iwlwifi bits, Emmanuel says:
"I have here a few things that depend on the latest mac80211's changes:
RRM, TPC, Quiet Period etc... Eyal keeps improving our rate control
and we have a new device ID. This last patch should probably have
gone to wireless.git, but at that stage, I preferred to send it to
-next and CC stable."
For (most of) the Atheros bits, Kalle says:
"The only new feature is testmode support from me. Ben added a new method
to crash the firmware with an assert for debug purposes. As usual, we
have lots of smaller fixes from Michal. Matteo fixed a Kconfig
dependency with debugfs. I fixed some warnings recently added to
checkpatch."
For the NFC bits, Samuel says:
"We've had major updates for TI and ST Microelectronics drivers, and a
few NCI related changes.
For TI's trf7970a driver:
- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues
For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes
Finally, Marvell added ISO15693 support to the NCI stack, together with a
couple of NCI fixes."
For the Bluetooth bits, Johan says:
"This 3.18 pull request replaces the one I did on Monday ("bluetooth-next
2014-09-22", which hasn't been pulled yet). The additions since the last
request are:
- SCO connection fix for devices not supporting eSCO
- Cleanups regarding the SCO establishment logic
- Remove unnecessary return value from logging functions
- Header compression fix for 6lowpan
- Cleanups to the ieee802154/mrf24j40 driver
Here's a copy from previous request that this one replaces:
'
Here are some more patches for 3.18. They include various fixes to the
btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the
MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring).
I've held on to this pull request a bit since we were waiting for a SCO
related fix to get sorted out first. However, since the merge window is
getting closer I decided not to wait for it. If we do get the fix sorted
out there'll probably be a second small pull request later this week.
'"
And,
"Unless 3.17 gets delayed this will probably be our last -next pull request for
3.18. We've got:
- New Marvell hardware supportr
- Multicast support for 6lowpan
- Several of 6lowpan fixes & cleanups
- Fix for a (false-positive) lockdep warning in L2CAP
- Minor btusb cleanup"
On top of all that comes the usual sort of updates to ath5k, ath9k,
ath10k, brcmfmac, mwifiex, and wil6210. This time around there are
also a number of rtlwifi updates to enable some new hardware and
to reconcile the in-kernel drivers with some newer releases of the
Realtek vendor drivers. Also of note is some device tree work for
the bcma bus.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 01:32:37 +0000 (21:32 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:
1) Add abstracted ICMP codes to the nf_tables reject expression. We
introduce four reasons to reject using ICMP that overlap in IPv4
and IPv6 from the semantic point of view. This should simplify the
maintainance of dual stack rule-sets through the inet table.
2) Move nf_send_reset() functions from header files to per-family
nf_reject modules, suggested by Patrick McHardy.
3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
code now that br_netfilter can be modularized. Convert remaining spots
in the network stack code.
4) Use rcu_barrier() in the nf_tables module removal path to ensure that
we don't leave object that are still pending to be released via
call_rcu (that may likely result in a crash).
5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
idea was to probe the word size based on the xtables match/target info
size, but this assumption is wrong when you have to dump the information
back to userspace.
6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
In order to emulate the ebtables NAT chains (which are actually simple
filter chains with no special semantics), we have support filtering from
this hooks too.
7) Add explicit module dependency between xt_physdev and br_netfilter.
This provides a way to detect if the user needs br_netfilter from
the configuration path. This should reduce the breakage of the
br_netfilter modularization.
8) Cleanup coding style in ip_vs.h, from Simon Horman.
9) Fix crash in the recently added nf_tables masq expression. We have
to register/unregister the notifiers to clean up the conntrack table
entries from the module init/exit path, not from the rule addition /
deletion path. From Arturo Borrero.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 01:21:44 +0000 (21:21 -0400)]
Merge branch 'bridge_default_pvid'
Vladislav Yasevich says:
====================
bridge: Add vlan filtering support for default pvid
This series adds default pvid support to vlan filtering in the bridge.
VLAN 1 (as recommended by 802.1q spec) is used as default pvid on ports.
Then the user can over-ride this configuration by configuring their
own vlan information.
The user can additionally change the default value through the
sysfs interface (netlink coming shortly).
The user can turn off default pvid functionality by setting default
pvid to 0.
This series changes the default behavior of the bridge when
vlan filtering is turned on. Currently, ports without any vlan
filtering configured will not recevie any traffic at all. This patch
changes the behavior of the above ports to receive only untagged traffic.
Since v3:
- allocated 'changed' bitmap on the heap and re-arrange code to clean it up.
- remove extra blank lines.
- Fix patch1 to build by itself.
- Fix error recover to not add vlan 0.
- Restructure nbp_vlan_init to remove uneeded variable.
Since v2:
- Fix handling of invalid values in sysfs interface.
- Add some additional log messages.
- Fix default_pvid handling when vlan filtering is compiled out.
- Fix sparse issues with new code.
- Fix how we located the old default pvid (added a helper function).
Since v1:
- Add ability to turn off default_pvid settings.
- Drop the automiatic filtering support based on configured vlan devices (will
be its own series)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 3 Oct 2014 15:29:18 +0000 (11:29 -0400)]
bridge: Add filtering support for default_pvid
Currently when vlan filtering is turned on on the bridge, the bridge
will drop all traffic untill the user configures the filter. This
isn't very nice for ports that don't care about vlans and just
want untagged traffic.
A concept of a default_pvid was recently introduced. This patch
adds filtering support for default_pvid. Now, ports that don't
care about vlans and don't define there own filter will belong
to the VLAN of the default_pvid and continue to receive untagged
traffic.
This filtering can be disabled by setting default_pvid to 0.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 3 Oct 2014 15:29:17 +0000 (11:29 -0400)]
bridge: Simplify pvid checks.
Currently, if the pvid is not set, we return an illegal vlan value
even though the pvid value is set to 0. Since pvid of 0 is currently
invalid, just return 0 instead. This makes the current and future
checks simpler.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 3 Oct 2014 15:29:16 +0000 (11:29 -0400)]
bridge: Add a default_pvid sysfs attribute
This patch allows the user to set and retrieve default_pvid
value. A new value can only be stored when vlan filtering
is disabled.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Ténart [Fri, 3 Oct 2014 15:08:19 +0000 (17:08 +0200)]
net: pxa168_eth: avoid using signed char for bitops
Signedness bugs may occur when using signed char for bitops,
depending on if the highest bit is ever used.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Oct 2014 01:17:56 +0000 (21:17 -0400)]
Merge branch 'isdn-next'
Tilman Schmidt says:
====================
ISDN patches for net-next
Here's a series of patches for the ISDN CAPI subsystem and the
Gigaset ISDN driver. Please merge via net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Fri, 3 Oct 2014 15:03:32 +0000 (17:03 +0200)]
isdn/gigaset: use USB API function usb_endpoint_num()
Use function usb_endpoint_num() for the bulk endpoint and store
the endpoint number in the cardstate structure instead of the raw
endpoint address value.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Fri, 3 Oct 2014 15:03:32 +0000 (17:03 +0200)]
isdn/gigaset: drop unused cardstate structure member
Field int_in_endpointAddr was set but never used. Drop it.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Fri, 3 Oct 2014 15:03:32 +0000 (17:03 +0200)]
isdn/gigaset: improve error handling when leaving DLE mode
Avoid cascading warnings when leaving DLE mode fails by clearing
the DLE flag before entering recovery.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Fri, 3 Oct 2014 15:03:32 +0000 (17:03 +0200)]
isdn/capi: drop two dead if branches
The last branch in command_2_index() cannot be reached since
c==0xff is already caught by the first "if".
The empty second branch makes no difference since no other branch
will be taken for c<0x0f.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Fri, 3 Oct 2014 05:43:09 +0000 (22:43 -0700)]
net: sched: suspicious RCU usage in qdisc_watchdog
Suspicious RCU usage in qdisc_watchdog call needs to be done inside
rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
need to ensure timer is cancelled before removing qdisc structure.
[ 3992.191339] ===============================
[ 3992.191340] [ INFO: suspicious RCU usage. ]
[ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
[ 3992.191345] -------------------------------
[ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
[ 3992.191348]
[ 3992.191348] other info that might help us debug this:
[ 3992.191348]
[ 3992.191351]
[ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
[ 3992.191353] no locks held by swapper/1/0.
[ 3992.191355]
[ 3992.191355] stack backtrace:
[ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
[ 3992.191360] Hardware name: /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
[ 3992.191362]
0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
[ 3992.191366]
ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
[ 3992.191370]
ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
[ 3992.191374] Call Trace:
[ 3992.191376] <IRQ> [<
ffffffff8178f92c>] dump_stack+0x4e/0x68
[ 3992.191387] [<
ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130
[ 3992.191392] [<
ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0
[ 3992.191396] [<
ffffffff810f93f2>] __run_hrtimer+0x72/0x420
[ 3992.191399] [<
ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240
[ 3992.191403] [<
ffffffff816720b0>] ? tc_classify+0xc0/0xc0
[ 3992.191406] [<
ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240
[ 3992.191410] [<
ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140
[ 3992.191415] [<
ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60
[ 3992.191419] [<
ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60
[ 3992.191422] [<
ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80
[ 3992.191424] <EOI> [<
ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0
[ 3992.191432] [<
ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0
[ 3992.191437] [<
ffffffff815ed567>] cpuidle_enter+0x17/0x20
[ 3992.191441] [<
ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0
[ 3992.191445] [<
ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30
[ 3992.191448] [<
ffffffff81033c16>] start_secondary+0x1b6/0x260
Fixes:
b26b0d1e8b1 ("net: qdisc: use rcu prefix and silence sparse warnings")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 3 Oct 2014 01:56:03 +0000 (18:56 -0700)]
net: dsa: do not call phy_start_aneg
Commit
f7f1de51edbd ("net: dsa: start and stop the PHY state machine")
add calls to phy_start() in dsa_slave_open() respectively phy_stop() in
dsa_slave_close().
We also call phy_start_aneg() in dsa_slave_create(), and this call is
messing up with the PHY state machine, since we basically start the
auto-negotiation, and later on restart it when calling phy_start().
phy_start() does not currently handle the PHY_FORCING or PHY_AN states
properly, but such a fix would be too invasive for this window.
Fixes:
f7f1de51edbd ("net: dsa: start and stop the PHY state machine")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sébastien Barré [Thu, 2 Oct 2014 19:15:22 +0000 (21:15 +0200)]
Removed unused inet6 address state
the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition.
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vijay Subramanian [Thu, 2 Oct 2014 17:00:43 +0000 (10:00 -0700)]
net: Cleanup skb cloning by adding SKB_FCLONE_FREE
SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
1: If skb is allocated from head_cache, it indicates fclone is not available.
2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
it is available to be used.
To avoid confusion for case 2 above, this patch replaces
SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
companion skbs, this indicates it is free for use.
SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
cannot / will not have a companion fclone.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Oct 2014 15:24:21 +0000 (08:24 -0700)]
mlx4: add a new xmit_more counter
ethtool -S reports a new counter, tracking number of time doorbell
was not triggered, because skb->xmit_more was set.
$ ethtool -S eth0 | egrep "tx_packet|xmit_more"
tx_packets:
2413288400
xmit_more:
666121277
I merged the tso_packet false sharing avoidance in this patch as well.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Oct 2014 23:53:36 +0000 (16:53 -0700)]
Merge branch 'gudp'
Tom Herbert says:
====================
net: Generic UDP Encapsulation
Generic UDP Encapsulation (GUE) is UDP encapsulation protocol which
encapsulates packets of various IP protocols. The GUE protocol is
described in http://tools.ietf.org/html/draft-herbert-gue-01.
The receive path of GUE is implemented in the FOU over UDP module (FOU).
This includes a UDP encap receive function for GUE as well as GUE
specific GRO functions. Management and configuration of GUE ports shares
most of the same code with FOU.
For the transmit path, the previous FOU support for IPIP, sit, and GRE
was simply extended for GUE (when GUE is enabled insert the GUE
header on transmit in addition to UDP header inserted for FOU).
Semantically GUE is the same as FOU in that the encapsulation (UDP
and GUE headers) that are inserted on transmission and removed on
reception so that IP packet is processed with the inner header.
This patch set includes:
- Some fixes to FOU, removal of IPv4,v6 specific GRO functions
- Support to configure a GUE receive port
- Implementation of GUE receive path (normal and GRO)
- Additions to ip_tunnel netlink to configure GUE
- GUE header inserion in ip_tunnel transmit path
v2:
- Include net/gue.h in patch set
Testing:
I ran performance numbers using netperf TCP_RR with 200 streams,
comparing encapsulation without GUE, encapsulation with GUE, and
encapsulation with FOU.
GRE
TCP_STREAM
IPv4, FOU, UDP checksum enabled
14.04% TX CPU utilization
13.17% RX CPU utilization
9211 Mbps
IPv4, GUE, UDP checksum enabled
14.99% TX CPU utilization
13.79% RX CPU utilization
9185 Mbps
IPv4, FOU, UDP checksum disabled
13.14% TX CPU utilization
23.18% RX CPU utilization
9277 Mbps
IPv4, GUE, UDP checksum disabled
13.66% TX CPU utilization
23.57% RX CPU utilization
9184 Mbps
TCP_RR
IPv4, FOU, UDP checksum enabled
94.2% CPU utilization
155/249/460 90/95/99% latencies
1.17018e+06 tps
IPv4, GUE, UDP checksum enabled
93.9% CPU utilization
158/253/472 90/95/99% latencies
1.15045e+06 tps
IPIP
TCP_STREAM
FOU, UDP checksum enabled
15.28% TX CPU utilization
13.92% RX CPU utilization
9342 Mbps
GUE, UDP checksum enabled
13.99% TX CPU utilization
13.34% RX CPU utilization
9210 Mbps
FOU, UDP checksum disabled
15.08% TX CPU utilization
24.64% RX CPU utilization
9226 Mbps
GUE, UDP checksum disabled
15.90% TX CPU utilization
24.77% RX CPU utilization
9197 Mbps
TCP_RR
FOU, UDP checksum enabled
94.23% CPU utilization
149/237/429 90/95/99% latencies
1.19553e+06 tps
GUE, UDP checksum enabled
93.75% CPU utilization
152/243/442 90/95/99% latencies
1.17027e+06 tps
SIT
TCP_STREAM
FOU, UDP checksum enabled
14.47% TX CPU utilization
14.58% RX CPU utilization
9106 Mbps
GUE, UDP checksum enabled
15.09% TX CPU utilization
14.84% RX CPU utilization
9080 Mbps
FOU, UDP checksum disabled
15.70% TX CPU utilization
27.93% RX CPU utilization
9097 Mbps
GUE, UDP checksum disabled
15.04% TX CPU utilization
27.54% RX CPU utilization
9073 Mbps
TCP_RR
FOU, UDP checksum enabled
96.9% CPU utilization
170/281/581 90/95/99% latencies
1.03372e+06 tps
GUE, UDP checksum enabled
97.16% CPU utilization
172/286/576 90/95/99% latencies
1.00469e+06 tps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 3 Oct 2014 22:48:10 +0000 (15:48 -0700)]
ip_tunnel: Add GUE support
This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
This is very similar to fou excpet that we need to insert the GUE header
in addition to the UDP header on transmit.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 3 Oct 2014 22:48:09 +0000 (15:48 -0700)]
gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.
For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 3 Oct 2014 22:48:08 +0000 (15:48 -0700)]
fou: eliminate IPv4,v6 specific GRO functions
This patch removes fou[46]_gro_receive and fou[46]_gro_complete
functions. The v4 or v6 variants were chosen for the UDP offloads
based on the address family of the socket this is not necessary
or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
This is set in udp6_gro_receive and unset in udp4_gro_receive. In
fou_gro_receive the value is used to select the correct inet_offloads
for the protocol of the outer IP header.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 3 Oct 2014 22:48:07 +0000 (15:48 -0700)]
ip_tunnel: Account for secondary encapsulation header in max_headroom
When adjusting max_header for the tunnel interface based on egress
device we need to account for any extra bytes in secondary encapsulation
(e.g. FOU).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Oct 2014 14:38:46 +0000 (07:38 -0700)]
net: do not export skb_gro_receive()
skb_gro_receive() is only called from tcp_gro_receive() which is
not in a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Gang [Thu, 2 Oct 2014 14:32:56 +0000 (22:32 +0800)]
drivers/net/irda/Kconfig: Let SH_IRDA depend on HAS_IOMEM
SH_IRDA needs HAS_IOMEM, so depend on it. The related error(with
allmodconfig under um):
CC [M] drivers/net/irda/sh_irda.o
drivers/net/irda/sh_irda.c: In function ‘sh_irda_probe’:
drivers/net/irda/sh_irda.c:776:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
self->membase = ioremap_nocache(res->start, resource_size(res));
^
drivers/net/irda/sh_irda.c:776:16: warning: assignment makes pointer from integer without a cast [enabled by default]
self->membase = ioremap_nocache(res->start, resource_size(res));
^
drivers/net/irda/sh_irda.c:821:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
iounmap(self->membase);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Gang [Thu, 2 Oct 2014 14:23:33 +0000 (22:23 +0800)]
drivers/net/ethernet/marvell/Kconfig: Let PXA168_ETH depend on HAS_IOMEM
PXA168_ETH need HAS_IOMEM, so depend on it, the related error (with
allmodconfig under um):
CC [M] drivers/net/ethernet/marvell/pxa168_eth.o
drivers/net/ethernet/marvell/pxa168_eth.c: In function ‘pxa168_eth_probe’:
drivers/net/ethernet/marvell/pxa168_eth.c:1605:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
iounmap(pep->base);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Gang [Thu, 2 Oct 2014 14:14:04 +0000 (22:14 +0800)]
drivers/net/dsa/Kconfig: Let NET_DSA_BCM_SF2 depend on HAS_IOMEM
NET_DSA_BCM_SF2 need HAS_IOMEM, so depend on it, the related error (with
allmodconfig under um):
CC [M] drivers/net/dsa/bcm_sf2.o
drivers/net/dsa/bcm_sf2.c: In function ‘bcm_sf2_sw_setup’:
drivers/net/dsa/bcm_sf2.c:487:3: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
iounmap(*base);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Gang [Thu, 2 Oct 2014 14:01:42 +0000 (22:01 +0800)]
drivers/net/can/Kconfig: Let CAN_AT91 depend on HAS_IOMEM
CAN_AT91 needs HAS_IOMEM, so depends on it. The related error (with
allmodconfig under um):
CC [M] drivers/net/can/at91_can.o
drivers/net/can/at91_can.c: In function ‘at91_can_probe’:
drivers/net/can/at91_can.c:1329:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
addr = ioremap_nocache(res->start, resource_size(res));
^
drivers/net/can/at91_can.c:1329:7: warning: assignment makes pointer from integer without a cast [enabled by default]
addr = ioremap_nocache(res->start, resource_size(res));
^
drivers/net/can/at91_can.c:1384:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
iounmap(addr);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Oct 2014 22:43:50 +0000 (15:43 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-10-02
This series contains updates to fm10k, igb, ixgbe and i40e.
Alex provides two updates to the fm10k driver. First reduces the buffer
size to 2k for all page sizes, since most frames only have a 1500 MTU
so supporting a buffer size larger than this is somewhat wasteful.
Second fixes an issue where the number of transmit queues was not being
updated, so added the lines necessary to update the number of transmit
queues.
Rick Jones provides two patches to convert ixgbe, igb and i40e to use
dev_consume_skb_any().
Emil provides two patches for ixgbe, first cleans up a couple of wait
loops on auto-negotiation that were not needed. Second fixes an issue
reported by Fujitsu/Red Hat, which consolidates the logic behind the
dynamically setting of TXDCTL.WTHRESH depending on interrupt throttle
rate (ITR) setting regardless of BQL.
Ethan Zhao provides a cleanup patch for ixgbe where he noticed a
duplicate define.
Bernhard Kaindl provides a patch for igb to remove a source of latency
spikes by not calling code that uses mdelay() for feeding a PHY stat
while being called with a spinlock held.
Todd bumps the igb version based on the recent changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Oct 2014 22:42:37 +0000 (15:42 -0700)]
Merge branch 'mlx5-next'
Eli Cohen says:
====================
mlx5 update for 3.18
This series integrates a new mechanism for populating and extracting field values
used in the driver/firmware interaction around command mailboxes.
Changes from V1:
- Remove unused definition of memcpy_cpu_to_be32()
- Remove definitions of non_existent_*() and use BUILD_BUG_ON() instead.
- Added a patch one line patch to add support for ConnectX-4 devices.
Changes from V0:
- trimmed the auto-generated file to a minimum, as required by the reviewers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 2 Oct 2014 09:19:46 +0000 (12:19 +0300)]
net/mlx5_core: Add ConnectX-4 to list of supported devices
Add the upcoming ConnectX-4 device to the list of supported devices by then
mlx5 driver.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 2 Oct 2014 09:19:45 +0000 (12:19 +0300)]
net/mlx5_core: Identify resources by their type
This patch puts a common part as the first field of mlx5_core_qp. This field is
used to identify which resource generated an event. This is required since upcoming
new resource types such as DC targets are allocated for the same numerical space
as regular QPs and may generate the same events. By searching the resource in the
same table we can then look at the common field to identify the resource.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 2 Oct 2014 09:19:44 +0000 (12:19 +0300)]
net/mlx5_core: use set/get macros in device caps
Transform device capabilities related commands to use set/get macros to
manipulate command mailboxes.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 2 Oct 2014 09:19:43 +0000 (12:19 +0300)]
net/mlx5_core: Use hardware registers description header file
Add an auto generated header file that describes hardware registers along with
set of macros that set/get values. The macros do static checks to avoid
overflow, handle endianess, and overall provide a clean way to code commands.
Currently the header file is small and we will add structs as we make use of
the macros.
A few commands were removed from the commands enum since they are not supported
currently and will be added when support is available.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 2 Oct 2014 09:19:42 +0000 (12:19 +0300)]
net/mlx5_core: Update device capabilities handling
Rearrange struct mlx5_caps so it has a "gen" field to represent the current
capabilities configured for the device. Max capabilities can also be queried
from the device. Also update capabilities struct to contain more fields as per
the latest revision if firmware specification.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Oct 2014 22:31:07 +0000 (15:31 -0700)]
qdisc: validate skb without holding lock
Validation of skb can be pretty expensive :
GSO segmentation and/or checksum computations.
We can do this without holding qdisc lock, so that other cpus
can queue additional packets.
Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.
Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.
Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.
Same if disabling TX checksum offload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Thu, 2 Oct 2014 08:15:30 +0000 (10:15 +0200)]
net: ethernet: Remove superfluous ether_setup after alloc_etherdev
There is no need to call ether_setup after alloc_ethdev since it was
already called there.
Follow commits
c706471b2601 ("net: axienet: remove unnecessary
ether_setup after alloc_etherdev") and
3c87dcbfb36c ("net: ll_temac:
Remove unnecessary ether_setup after alloc_etherdev") and fix the
pattern in all remaining ethernet drivers.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Oct 2014 19:37:23 +0000 (12:37 -0700)]
Merge branch 'qdisc_bulk_dequeue'
Jesper Dangaard Brouer says:
====================
qdisc: bulk dequeue support
This patchset uses DaveM's recent API changes to dev_hard_start_xmit(),
from the qdisc layer, to implement dequeue bulking.
Patch01: "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"
- Implement basic qdisc dequeue bulking
- This time, 100% relying on BQL limits, no magic safe-guard constants
Patch02: "qdisc: dequeue bulking also pickup GSO/TSO packets"
- Extend bulking to bulk several GSO/TSO packets
- Seperate patch, as it introduce a small regression, see test section.
We do have a patch03, which exports a userspace tunable as a BQL
tunable, that can byte-cap or disable the bulking/bursting. But we
could not agree on it internally, thus not sending it now. We
basically strive to avoid adding any new userspace tunable.
Testing patch01:
================
Demonstrating the performance improvement of qdisc dequeue bulking, is
tricky because the effect only "kicks-in" once the qdisc system have a
backlog. Thus, for a backlog to form, we need either 1) to exceed wirespeed
of the link or 2) exceed the capability of the device driver.
For practical use-cases, the measureable effect of this will be a
reduction in CPU usage
01-TCP_STREAM:
--------------
Testing effect for TCP involves disabling TSO and GSO, because TCP
already benefit from bulking, via TSO and especially for GSO segmented
packets. This patch view TSO/GSO as a seperate kind of bulking, and
avoid further bulking of these packet types.
The measured perf diff benefit (at 10Gbit/s) for a single netperf
TCP_STREAM were 9.24% less CPU used on calls to _raw_spin_lock()
(mostly from sch_direct_xmit).
If my E5-2695v2(ES) CPU is tuned according to:
http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html
Then it is possible that a single netperf TCP_STREAM, with GSO and TSO
disabled, can utilize all bandwidth on a 10Gbit/s link. This will
then cause a standing backlog queue at the qdisc layer.
Trying to pressure the system some more CPU util wise, I'm starting
24x TCP_STREAMs and monitoring the overall CPU utilization. This
confirms bulking saves CPU cycles when it "kicks-in".
Tool mpstat, while stressing the system with netperf 24x TCP_STREAM, shows:
* Disabled bulking: sys:2.58% soft:8.50% idle:88.78%
* Enabled bulking: sys:2.43% soft:7.66% idle:89.79%
02-UDP_STREAM
-------------
The measured perf diff benefit for UDP_STREAM were 6.41% less CPU used
on calls to _raw_spin_lock(). 24x UDP_STREAM with packet size -m 1472 (to
avoid sending UDP/IP fragments).
03-trafgen driver test
----------------------
The performance of the 10Gbit/s ixgbe driver is limited due to
updating the HW ring-queue tail-pointer on every packet. As
previously demonstrated with pktgen.
Using trafgen to send RAW frames from userspace (via AF_PACKET), and
forcing it through qdisc path (with option --qdisc-path and -t0),
sending with 12 CPUs.
I can demonstrate this driver layer limitation:
* 12.8 Mpps with no qdisc bulking
* 14.8 Mpps with qdisc bulking (full 10G-wirespeed)
Testing patch02:
================
Testing Bulking several GSO/TSO packets:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec for 10G while transmitting
approx 813Kpps).
Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec, for 10G
while transmitting approx 813Kpps).
Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.
Corrosponding to:
(10000*10^6)*((0.50-0.47)/10^3)/8 = 37500 bytes
(10000*10^6)*((0.50-0.38)/10^3)/8 = 150000 bytes
37500/1500 = 25 pkts
150000/1500 = 100 pkts
Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
diff of 0.12ms corrosponding to 1500 bytes at 100Mbit/s. Bulking
several GSOs together, result in a stable high-prio queue delay of
2.23ms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 1 Oct 2014 20:36:09 +0000 (22:36 +0200)]
qdisc: dequeue bulking also pickup GSO/TSO packets
The TSO and GSO segmented packets already benefit from bulking
on their own.
The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.
The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit
53fda7f7f9e8 ("Merge
branch 'xmit_list'"), specifically via commit
7f2e870f2a4 ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list. And via commit
ce93718fb7cd ("net: Don't keep around original SKB when we
software segment GSO frames.").
This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.
Testing:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).
Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).
Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.
Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 1 Oct 2014 20:35:59 +0000 (22:35 +0200)]
qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.
This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().
The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
(1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet. The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
(2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.
One restriction of the new API is that every SKB must belong to the
same TXQ. This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.
Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()). In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit(). In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits. In-effect bulking will only happen for BQL enabled drivers.
Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.
For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets. They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.
Jointed work with Hannes, Daniel and Florian.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Einon [Tue, 30 Sep 2014 21:29:46 +0000 (22:29 +0100)]
et131x: Add PCIe gigabit ethernet driver et131x to drivers/net
This adds the ethernet driver for Agere et131x devices to
drivers/net/ethernet.
The driver being added has been in the staging tree for some time, and will be
removed from there in a seperate patch. This one merely disables the staging
version to prevent two instances being built.
Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arturo Borrero [Fri, 3 Oct 2014 12:13:36 +0000 (14:13 +0200)]
netfilter: nft_masq: register/unregister notifiers on module init/exit
We have to register the notifiers in the masquerade expression from
the the module _init and _exit path.
This fixes crashes when removing the masquerade rule with no
ipt_MASQUERADE support in place (which was masking the problem).
Fixes:
9ba1f72 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Larry Finger [Thu, 2 Oct 2014 17:00:54 +0000 (12:00 -0500)]
rtlwifi: Fix static checker warnings for various drivers
Indenting errors yielded the following static checker warnings:
drivers/net/wireless/rtlwifi/rtl8192ee/hw.c:533 rtl92ee_set_hw_reg() warn: add curly braces? (if)
drivers/net/wireless/rtlwifi/rtl8192ee/hw.c:539 rtl92ee_set_hw_reg() warn: add curly braces? (if)
An unreleased version of the static checker also reported:
drivers/net/wireless/rtlwifi/rtl8723be/trx.c:550 rtl8723be_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8188ee/trx.c:621 rtl88ee_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192ee/trx.c:567 rtl92ee_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8821ae/trx.c:758 rtl8821ae_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8723ae/trx.c:494 rtl8723e_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192se/trx.c:315 rtl92se_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192ce/trx.c:392 rtl92ce_rx_query_desc() warn: 'hdr' can't be NULL.
All of these are fixed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Larry Finger [Thu, 2 Oct 2014 17:00:53 +0000 (12:00 -0500)]
rtlwifi: Fix Kconfig for RTL8192EE
The driver needs btcoexist, but Kconfig fails to select it. This omission
could cause build errors for some configurations.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:20 +0000 (06:33 +0530)]
ath9k: Fix flushing in MCC mode
When we are attempting to switch to a new
channel context, the TX queues are flushed, but
the mac80211 queues are not stopped and traffic
can still come down to the driver.
This patch fixes it by stopping the queues
assigned to the current context/vif before
trying to flush.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:19 +0000 (06:33 +0530)]
ath9k: Fix queue handling for channel contexts
When a full chip reset is done, all the queues
across all VIFs are stopped, but if MCC is enabled,
only the queues of the current context is awakened,
when we complete the reset.
This results in unfairness for the inactive context.
Since frames are queued internally in the driver if
there is a context mismatch, we can awaken all the
queues when coming out of a reset.
The VIF-specific queues are still used in flow control,
to ensure fairness when traffic is high.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:18 +0000 (06:33 +0530)]
ath9k: Add ath9k_chanctx_stop_queues()
This can be used when the queues of a context
needs to be stopped.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:17 +0000 (06:33 +0530)]
ath9k: Pass context to ath9k_chanctx_wake_queues()
Change the ath9k_chanctx_wake_queues() API so
that we can pass the channel context that needs its
queues to be stopped.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:16 +0000 (06:33 +0530)]
ath9k: Fix queue handling in flush()
When draining of the TX queues fails, a
full HW reset is done. ath_reset() makes sure
that the queues in mac80211 are restarted,
so there is no need to wake them up again.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:15 +0000 (06:33 +0530)]
ath9k: Remove duplicate code
ath9k_has_tx_pending() can be used to
check if there are pending frames instead
of having duplicate code.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:14 +0000 (06:33 +0530)]
ath9k: Fix pending frame check
Checking for the queue depth outside of
the TX queue lock is incorrect and in this
case, is not required since it is done inside
ath9k_has_pending_frames().
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:13 +0000 (06:33 +0530)]
ath9k: Check pending frames properly
There is no need to check if the current
channel context has active ACs queued up
if the TX queue is not empty.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Thu, 2 Oct 2014 01:03:12 +0000 (06:33 +0530)]
ath9k: Print RoC expiration
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
David S. Miller [Thu, 2 Oct 2014 18:25:43 +0000 (11:25 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.c
Both r8152 and nfnetlink conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Avinash Patil [Wed, 1 Oct 2014 17:55:26 +0000 (10:55 -0700)]
mwifiex: add support for SD8887 chipset
This patch adds SD8887 support to mwifiex.
SD8887 is Marvell's 1x1 11ac solution.
The corresponding firmware image file is located at:
"mrvl/sd8887_uapsta.bin"
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Avinash Patil [Wed, 1 Oct 2014 17:55:25 +0000 (10:55 -0700)]
mwifiex: few more register offset entries for sdio card structure
This patch adds some more defitions to card specific register structure
and removes static defines for these registers.
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Wed, 1 Oct 2014 12:05:25 +0000 (15:05 +0300)]
wil6210: atomic I/O for the card memory
Introduce netdev IOCTLs, to be used by the debug tools.
Allows to read/write single dword value or
memory block, aligned to dword
Different address modes supported:
- BAR offset
- Firmware "linker" address
- target's AHB bus
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Wed, 1 Oct 2014 12:05:24 +0000 (15:05 +0300)]
wil6210: manual FW error recovery mode
Introduce manual FW recovery mode. It is activated if module parameter
@no_fw_recovery set to true. May be changed at runtime.
Recovery information provided by new "recovery" debugfs file. It prints:
mode = [auto|manual]
state = [idle|pending|running]
In manual mode, after FW error, recovery won't start automatically. Instead,
after notification to user space, recovery waits in "pending" state, as indicated by the
"recovery" debugfs file. User space tools may perform data collection and allow to
continue recovery by writing "run" to the "recovery" debugfs file.
Alternatively, recovery pending may be canceled by stopping network interface
i.e. 'ifconfig wlan0 down'
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith Manoharan [Sat, 27 Sep 2014 07:57:45 +0000 (13:27 +0530)]
ath: Add support for tracing
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
John W. Linville [Thu, 2 Oct 2014 17:56:19 +0000 (13:56 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Simon Horman [Tue, 30 Sep 2014 01:50:06 +0000 (10:50 +0900)]
ipvs: Clean up comment style in ip_vs.h
* Consistently use the multi-line comment style for networking code:
/* This
* That
* The other thing
*/
* Use single-line comment style for comments with only one line of text.
* In general follow the leading '*' of each line of a comment with a
single space and then text.
* Add missing line break between functions, remove double line break,
align comments to previous lines whenever possible.
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 2 Oct 2014 09:13:21 +0000 (11:13 +0200)]
netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.
Since
34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 1 Oct 2014 18:34:37 +0000 (20:34 +0200)]
netfilter: nf_tables: allow to filter from prerouting and postrouting
This allows us to emulate the NAT table in ebtables, which is actually
a plain filter chain that hooks at prerouting, output and postrouting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 17 Jun 2014 19:18:44 +0000 (21:18 +0200)]
netfilter: nft_compat: remove incomplete 32/64 bits arch compat code
This code was based on the wrong asumption that you can probe based
on the match/target private size that we get from userspace. This
doesn't work at all when you have to dump the info back to userspace
since you don't know what word size the userspace utility is using.
Currently, the extensions that require arch compat are limit match
and the ebt_mark match/target. The standard targets are not used by
the nft-xt compat layer, so they are not affected. We can work around
this limitation with a new revision that uses arch agnostic types.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 1 Oct 2014 11:53:20 +0000 (13:53 +0200)]
netfilter: nf_tables: wait for call_rcu completion on module removal
Make sure the objects have been released before the nf_tables modules
is removed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 1 Oct 2014 09:19:17 +0000 (11:19 +0200)]
netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
In
34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.
Use IS_ENABLED instead of ifdef to cover the module case.
Fixes:
34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 26 Sep 2014 12:35:15 +0000 (14:35 +0200)]
netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
nf_reject_ipv6 respectively. This code is shared by x_tables and
nf_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 26 Sep 2014 12:35:14 +0000 (14:35 +0200)]
netfilter: nft_reject: introduce icmp code abstraction for inet and bridge
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:
* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited
You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.
I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jukka Rissanen [Wed, 1 Oct 2014 12:59:15 +0000 (15:59 +0300)]
Bluetooth: 6lowpan: Check transmit errors for multicast packets
We did not return error if multicast packet transmit failed.
This might not be desired so return error also in this case.
If there are multiple 6lowpan devices where the multicast packet
is sent, then return error even if sending to only one of them fails.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Jukka Rissanen [Wed, 1 Oct 2014 12:59:14 +0000 (15:59 +0300)]
Bluetooth: 6lowpan: Return EAGAIN error also for multicast packets
Make sure that we are able to return EAGAIN from l2cap_chan_send()
even for multicast packets. The error code was ignored unncessarily.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Jukka Rissanen [Wed, 1 Oct 2014 08:30:57 +0000 (11:30 +0300)]
Bluetooth: 6lowpan: Avoid memory leak if memory allocation fails
If skb_unshare() returns NULL, then we leak the original skb.
Solution is to use temp variable to hold the new skb.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Jukka Rissanen [Wed, 1 Oct 2014 08:30:26 +0000 (11:30 +0300)]
Bluetooth: 6lowpan: Memory leak as the skb is not freed
The earlier multicast commit
36b3dd250dde ("Bluetooth: 6lowpan:
Ensure header compression does not corrupt IPv6 header") lost one
skb free which then caused memory leak.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Todd Fujinaka [Sat, 20 Sep 2014 04:46:25 +0000 (04:46 +0000)]
igb: bump version to 5.2.15
Bump version
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Rick Jones [Wed, 17 Sep 2014 03:56:20 +0000 (03:56 +0000)]
i40e/igb: Convert to dev_consume_skb_any()
Convert two more Intel NIC drivers to dev_consume_skb_any() to help
make dropped packet profiling sane.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bernhard Kaindl [Wed, 17 Sep 2014 19:11:16 +0000 (19:11 +0000)]
igb: remove blocking phy read from inside spinlock
Remove a source of latency spikes (in my case up to 10ms) by not calling
code that uses mdelay() for feeding a phy statistic (rx errors for idle
symbols - not data -> idle_errors) while being called with a spinlock held.
As idle_errors isn't read, this patch only removes unused code and data.
Later, more complicated changes may be applied to address the spinlock and
allow for some PHY diagnostics by harvesting this PHY stats register fully.
This patch is designed to fix the issue and be safe for longterm/stable.
For the Intel e1000e driver, the same change was applied in 2008 with
commit
23033fad5be0 ("e1000e: remove phy read from inside spinlock").
The mdelay is triggered by HW/SW semaphores, thus it depends on the HW.
I've HW that triggers it even when idle. Others may trigger it only e.g.
when Ethernet ports aquire or loose the link or on ifconfig up / down.
We've noticed this first from delays in frame rx/tx due to the mdelay().
Example command for checking if the issue is triggered: cyclictest -Smp1
(Look for occasional "Max:" values > 4000 or use -b 4000 to stop if greater)
It was observed with I350 ports connected to other I350 ports, but not
if driver and EEPROM was modified to run the I350 in EEPROM-less mode.
phy_stats.idle_errors and .receive_errors (isn't touched) occupy 64 not
used bits in the adapter struct: Their allocation may be removed as well.
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Todd Fujinaka <todd.fujinaka@intel.com>
Fixes:
12dcd86b75d5 ("igb: fix stats handling") (this added the spin_lock)
Signed-off-by: Bernhard Kaindl <bk-linux@use.startmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ethan Zhao [Tue, 23 Sep 2014 18:11:44 +0000 (18:11 +0000)]
ixgbe: delete one duplicate marcro definition of IXGBE_MAX_L2A_QUEUES
There is typo in ixgbe.h, two marcro definition of IXGBE_MAX_L2A_QUEUES to 4,
delete one, clear the compiler warning.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 18 Sep 2014 08:05:02 +0000 (08:05 +0000)]
ixgbe: fix setting of TXDCTL.WTRHESH when ITR is set to 0 and no BQL
This patch consolidates the logic behind dynamically setting TXDCTL.WTHRESH
depending on interrupt throttle rate (ITR) setting regardless of BQL.
Previously TXDCTL.WTHRESH was dynamically being set only with BQL being
enabled, but we have to set it regardless of BQL when ITR is low to avoid
Tx stalls/hangs.
CC: John Greene <jogreene@redhat.com>
Reported by: Masayuki Gouji <gouji.masayuki@jp.fujitsu.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 6 Sep 2014 07:50:27 +0000 (07:50 +0000)]
ixgbe: remove wait loop on autoneg for copper devices
This patch removes couple of wait loops on autoneg that are not needed.
During validation we noticed that the loops always time out, so there
should be no user impact.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Rick Jones [Fri, 12 Sep 2014 17:44:06 +0000 (17:44 +0000)]
ixgbe: Convert the normal transmit complete path to dev_consume_skb_any()
Convert the normal packet completion path to dev_consume_skb_any() so
packet drop profiling via dropwatch or perf top -G -e skb_kfree_skb
is not cluttered with false hits.
Compile tested only. There is a dev_kfree_skb_any() in the routine
ixgbe_ptp_tx_hwtstamp() in ixgbe_ptp.c that looks like a conversion
candidate but I wasn't familiar enough with the code to pull the
trigger.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Johan Hedberg [Thu, 2 Oct 2014 07:16:22 +0000 (10:16 +0300)]
Bluetooth: Fix lockdep warning with l2cap_chan_connect
The L2CAP connection's channel list lock (conn->chan_lock) must never be
taken while already holding a channel lock (chan->lock) in order to
avoid lock-inversion and lockdep warnings. So far the l2cap_chan_connect
function has acquired the chan->lock early in the function and then
later called l2cap_chan_add(conn, chan) which will try to take the
conn->chan_lock. This violates the correct order of taking the locks and
may lead to the following type of lockdep warnings:
-> #1 (&conn->chan_lock){+.+...}:
[<
c109324d>] lock_acquire+0x9d/0x140
[<
c188459c>] mutex_lock_nested+0x6c/0x420
[<
d0aab48e>] l2cap_chan_add+0x1e/0x40 [bluetooth]
[<
d0aac618>] l2cap_chan_connect+0x348/0x8f0 [bluetooth]
[<
d0cc9a91>] lowpan_control_write+0x221/0x2d0 [bluetooth_6lowpan]
-> #0 (&chan->lock){+.+.+.}:
[<
c10928d8>] __lock_acquire+0x1a18/0x1d20
[<
c109324d>] lock_acquire+0x9d/0x140
[<
c188459c>] mutex_lock_nested+0x6c/0x420
[<
d0ab05fd>] l2cap_connect_cfm+0x1dd/0x3f0 [bluetooth]
[<
d0a909c4>] hci_le_meta_evt+0x11a4/0x1260 [bluetooth]
[<
d0a910eb>] hci_event_packet+0x3ab/0x3120 [bluetooth]
[<
d0a7cb08>] hci_rx_work+0x208/0x4a0 [bluetooth]
CPU0 CPU1
---- ----
lock(&conn->chan_lock);
lock(&chan->lock);
lock(&conn->chan_lock);
lock(&chan->lock);
Before calling l2cap_chan_add() the channel is not part of the
conn->chan_l list, and can therefore only be accessed by the L2CAP user
(such as l2cap_sock.c). We can therefore assume that it is the
responsibility of the user to handle mutual exclusion until this point
(which we can see is already true in l2cap_sock.c by it in many places
touching chan members without holding chan->lock).
Since the hci_conn and by exctension l2cap_conn creation in the
l2cap_chan_connect() function depend on chan details we cannot simply
add a mutex_lock(&conn->chan_lock) in the beginning of the function
(since the conn object doesn't yet exist there). What we can do however
is move the chan->lock taking later into the function where we already
have the conn object and can that way take conn->chan_lock first.
This patch implements the above strategy and does some other necessary
changes such as using __l2cap_chan_add() which assumes conn->chan_lock
is held, as well as adding a second needed label so the unlocking
happens as it should.
Reported-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Duyck [Tue, 30 Sep 2014 22:49:22 +0000 (22:49 +0000)]
fm10k: Correctly set the number of Tx queues
The number of Tx queues was not being updated due to some issues when
generating the patches. This change makes sure to add the lines necessary
to update the number of Tx queues correctly.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 26 Sep 2014 06:33:49 +0000 (06:33 +0000)]
fm10k: Reduce buffer size when pages are larger than 4K
This change reduces the buffer size to 2K for all page sizes. The basic
idea is that since most frames only have a 1500 MTU supporting a buffer
size larger than this is somewhat wasteful. As such I have reduced the
size to 2K for all page sizes which will allow for more uses per page.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Thu, 2 Oct 2014 04:29:06 +0000 (21:29 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Don't halt the firmware in r8152 driver, from Hayes Wang.
2) Handle full sized 802.1ad frames in bnx2 and tg3 drivers properly,
from Vlad Yasevich.
3) Don't sleep while holding tx_clean_lock in netxen driver, fix from
Manish Chopra.
4) Certain kinds of ipv6 routes can end up endlessly failing the route
validation test, causing it to be re-looked up over and over again.
This particularly kills input route caching in TCP sockets. Fix
from Hannes Frederic Sowa.
5) netvsc_start_xmit() has a use-after-free access to skb->len, fix
from K Y Srinivasan.
6) Fix matching of inverted containers in ematch module, from Ignacy
Gawędzki.
7) Aggregation of GRO frames via SKB ->frag_list for linear skbs isn't
handled properly, regression fix from Eric Dumazet.
8) Don't test return value of ipv4_neigh_lookup(), which returns an
error pointer, against NULL. From WANG Cong.
9) Fix an old regression where we mistakenly allow a double add of the
same tunnel. Fixes from Steffen Klassert.
10) macvtap device delete and open can run in parallel and corrupt lists
etc., fix from Vlad Yasevich.
11) Fix build error with IPV6=m NETFILTER_XT_TARGET_TPROXY=y, from Pablo
Neira Ayuso.
12) rhashtable_destroy() triggers lockdep splats, fix also from Pablo.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
bna: Update Maintainer Email
r8152: disable power cut for RTL8153
r8152: remove clearing bp
bnx2: Correctly receive full sized 802.1ad fragmes
tg3: Allow for recieve of full-size 8021AD frames
r8152: fix setting RTL8152_UNPLUG
netxen: Fix bug in Tx completion path.
netxen: Fix BUG "sleeping function called from invalid context"
ipv6: remove rt6i_genid
hyperv: Fix a bug in netvsc_start_xmit()
net: stmmac: fix stmmac_pci_probe failed when CONFIG_HAVE_CLK is selected
ematch: Fix matching of inverted containers.
gro: fix aggregation for skb using frag_list
neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
ip6_gre: Return an error when adding an existing tunnel.
ip6_vti: Return an error when adding an existing tunnel.
ip6_tunnel: Return an error when adding an existing tunnel.
ip6gre: add a rtnl link alias for ip6gretap
net/mlx4_core: Allow not to specify probe_vf in SRIOV IB mode
r8152: fix the carrier off when autoresuming
...
Rasesh Mody [Wed, 1 Oct 2014 21:20:41 +0000 (17:20 -0400)]
bna: Update Maintainer Email
Update the maintainer email for BNA driver.
Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Wed, 1 Oct 2014 18:58:02 +0000 (11:58 -0700)]
net: phy: add BCM7425 and BCM7429 PHYs
Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Wed, 1 Oct 2014 18:30:01 +0000 (11:30 -0700)]
net: bcmgenet: fix bcmgenet_put_tx_csum()
bcmgenet_put_tx_csum() needs to return skb pointer back to the caller
because it reallocates a new one in case of lack of skb headroom.
Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Wed, 1 Oct 2014 00:53:21 +0000 (17:53 -0700)]
net: pktgen: packet bursting via skb->xmit_more
This patch demonstrates the effect of delaying update of HW tailptr.
(based on earlier patch by Jesper)
burst=1 is the default. It sends one packet with xmit_more=false
burst=2 sends one packet with xmit_more=true and
2nd copy of the same packet with xmit_more=false
burst=3 sends two copies of the same packet with xmit_more=true and
3rd copy with xmit_more=false
Performance with ixgbe (usec 30):
burst=1 tx:9.2 Mpps
burst=2 tx:13.5 Mpps
burst=3 tx:14.5 Mpps full 10G line rate
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 30 Sep 2014 23:13:19 +0000 (16:13 -0700)]
net: bridge: add a br_set_state helper function
In preparation for being able to propagate port states to e.g: notifiers
or other kernel parts, do not manipulate the port state directly, but
instead use a helper function which will allow us to do a bit more than
just setting the state.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>