Eric Dumazet [Mon, 13 Oct 2014 13:27:47 +0000 (06:27 -0700)]
tcp: TCP Small Queues and strange attractors
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.
Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.
What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.
This became particularly visible with latest skb->xmit_more support
Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :
Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.
Tested:
lpaa23:~# ./super_netperf 1400 --google-pacing-rate
3028000 -H lpaa24 -l 3600 &
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM eth1 595448.00
1190564.00 38381.09
1760253.12 0.00 0.00 1.00
06:16:19 AM eth1 594858.00
1189686.00 38340.76
1758952.72 0.00 0.00 0.00
06:16:20 AM eth1 597017.00
1194019.00 38480.79
1765370.29 0.00 0.00 1.00
06:16:21 AM eth1 595450.00
1190936.00 38380.19
1760805.05 0.00 0.00 0.00
06:16:22 AM eth1 596385.00
1193096.00 38442.56
1763976.29 0.00 0.00 1.00
06:16:23 AM eth1 598155.00
1195978.00 38552.97
1768264.60 0.00 0.00 0.00
06:16:24 AM eth1 594405.00
1188643.00 38312.57
1757414.89 0.00 0.00 1.00
06:16:25 AM eth1 593366.00
1187154.00 38252.16
1755195.83 0.00 0.00 0.00
06:16:26 AM eth1 593188.00
1186118.00 38232.88
1753682.57 0.00 0.00 1.00
06:16:27 AM eth1 596301.00
1192241.00 38440.94
1762733.09 0.00 0.00 0.00
Average: eth1 595457.30
1190843.50 38381.69
1760664.84 0.00 0.00 0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
backlog
7606336b 2513p requeues 167982
backlog
224072b 74p requeues 566
backlog
581376b 192p requeues 5598
backlog
181680b 60p requeues 1070
backlog
5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows
backlog
157456b 52p requeues 1758
backlog
672216b 222p requeues 3025
backlog 60560b 20p requeues 24541
backlog
448144b 148p requeues 21258
lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect
Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM eth1
1397632.00
2795397.00 90081.87
4133031.26 0.00 0.00 1.00
06:16:47 AM eth1
1396874.00
2793614.00 90032.99
4130385.46 0.00 0.00 0.00
06:16:48 AM eth1
1395842.00
2791600.00 89966.46
4127409.67 0.00 0.00 1.00
06:16:49 AM eth1
1395528.00
2791017.00 89946.17
4126551.24 0.00 0.00 0.00
06:16:50 AM eth1
1397891.00
2795716.00 90098.74
4133497.39 0.00 0.00 1.00
06:16:51 AM eth1
1394951.00
2789984.00 89908.96
4125022.51 0.00 0.00 0.00
06:16:52 AM eth1
1394608.00
2789190.00 89886.90
4123851.36 0.00 0.00 1.00
06:16:53 AM eth1
1395314.00
2790653.00 89934.33
4125983.09 0.00 0.00 0.00
06:16:54 AM eth1
1396115.00
2792276.00 89984.25
4128411.21 0.00 0.00 1.00
06:16:55 AM eth1
1396829.00
2793523.00 90030.19
4130250.28 0.00 0.00 0.00
Average: eth1
1396158.40
2792297.00 89987.09
4128439.35 0.00 0.00 0.50
lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
backlog
7900052b 2609p requeues 173287
backlog
878120b 290p requeues 589
backlog
1068884b 354p requeues 5621
backlog
996212b 329p requeues 1088
backlog
984100b 325p requeues 115316
backlog
956848b 316p requeues 1781
backlog
1080996b 357p requeues 3047
backlog
975016b 322p requeues 24571
backlog
990156b 327p requeues 21274
(All 8 TX queues get a fair share of the traffic)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 21:05:23 +0000 (17:05 -0400)]
Merge branch 'qlcnic'
Rajesh Borundia says:
====================
qlcnic: Bug fixes
This series fixes following issues.
* We were programming maximum number of arguments supported by
adapter instead of required in a command.
* Destroy tx command requires three arguments instead of two.
Please apply these patches to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Tue, 14 Oct 2014 11:41:46 +0000 (07:41 -0400)]
qlcnic: Fix number of arguments in destroy tx context command
o Number of arguments taken by destroy tx command is three
instead of two.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Tue, 14 Oct 2014 11:41:45 +0000 (07:41 -0400)]
qlcnic: Fix programming number of arguments in a command.
o Initially we were programming maximum number of arguments.
Instead we should program number of arguments required in
a command.
o Maximum number of arguments for 82xx adapter is four. Fix it
for GET_ESWITCH_STATS command.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Tue, 14 Oct 2014 13:28:38 +0000 (06:28 -0700)]
genl_magic: Resolve logical-op warnings
Resolve "logical 'and' applied to non-boolean constant" warnings"
that appear in W=2 builds by adding !! to a bit test.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 21:02:37 +0000 (17:02 -0400)]
net: Trap attempts to call sock_kfree_s() with a NULL pointer.
Unlike normal kfree() it is never right to call sock_kfree_s() with
a NULL pointer, because sock_kfree_s() also has the side effect of
discharging the memory from the sockets quota.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 14 Oct 2014 19:35:08 +0000 (12:35 -0700)]
rds: avoid calling sock_kfree_s() on allocation failure
It is okay to free a NULL pointer but not okay to mischarge the socket optmem
accounting. Compile test only.
Reported-by: rucsoftsec@gmail.com
Cc: Chien Yen <chien.yen@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Tue, 14 Oct 2014 20:24:14 +0000 (01:54 +0530)]
cxgb4: Fix FW flash logic using ethtool
Use t4_fw_upgrade instead of t4_load_fw to write firmware into FLASH, since
t4_load_fw doesn't co-ordinate with the firmware and the adapter can get hosed
enough to require a power cycle of the system.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 20:40:49 +0000 (16:40 -0400)]
Merge branch 'stmmac'
Giuseppe Cavallaro says:
====================
stmmac: review and fix the dwmac-sti glue-logic
This patch is to review the whole glue logic adopted on STi SoCs that
was bugged.
In the old glue-logic there was a lot of confusion when setup the
retiming especially for STiD127 where, for example, the bits 6 and 7
(in the GMAC control register) have a different meaning of what is
used for STiH4xx SoCs. So we cannot adopt the same glue for all these
SoCs.
Moreover, GiGa on STiD127 didn't work and, for all the SoCs, the RGMII
couldn't run when the speed was 10Mbps (because the clock was not properly
managed).
Note that the phy clock needs to be provided by the platform as well as
documented in the related binding file (updated as consequence).
The old code supported too many configurations never adopted and validated.
This made the code very complex to maintain and debug in case of issues.
The patch simplifies all the configurations as commented in the tables
inside the file and obviously it has been tested on all the boards
based on the SoCs mentioned.
With this patch, the dwmac-sti is also ready to support new configurations that
will be available on next SoC generations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 14 Oct 2014 06:12:56 +0000 (08:12 +0200)]
stmmac: dwmac-sti: review the glue-logic for STi4xx and STiD127 SoCs
This patch is to review the whole glue logic adopted on STi SoCs that
was bugged.
In the old glue-logic there was a lot of confusion when setup the
retiming especially for STiD127 where, for example, the bits 6 and 7
(in the GMAC control register) have a different meaning of what is
used for STiH4xx SoCs. So we cannot adopt the same glue for all these
SoCs.
Moreover, GiGa on STiD127 didn't work and, for all the SoCs, the RGMII
couldn't run when the speed was 10Mbps (because the clock was not properly
managed).
Note that the phy clock needs to be provided by the platform as well as
documented in the related binding file (updated as consequence).
The old code supported too many configurations never adopted and validated.
This made the code very complex to maintain and debug in case of issues.
The patch simplifies all the configurations as commented in the tables
inside the file and obviously it has been tested on all the boards
based on the SoCs mentioned.
With this patch, the dwmac-sti is also ready to support new configurations that
will be available on next SoC generations.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 14 Oct 2014 06:12:55 +0000 (08:12 +0200)]
stmmac: make the STi Layer compatible to STiH407
This adds the missing compatibility to the STiH407 SoC.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 14 Oct 2014 06:11:54 +0000 (08:11 +0200)]
stmmac: platform: fix FIXED_PHY support.
On several STi platforms: e.g. stihxxx-b2120 an Ethernet switch is
embedded and connected to the stmmac via RGMII mode. So this is managed
by using the FIXED_PHY. In that case, the support in the platform needs
to be fixed to allow the stmmac to dialog with the switch via fixed-link
by using phy_bus_name property.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Tue, 14 Oct 2014 18:21:04 +0000 (11:21 -0700)]
dsa: mv88e6171: Fix tag_protocol check
tag_protocol is now an enum, so drivers have to check against it.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 20:09:38 +0000 (16:09 -0400)]
Merge branch 'xgene'
Iyappan Subramanian says:
====================
Adding SGMII based 1GbE basic support to APM X-Gene SoC ethernet driver.
v2: Address comments from v1
* Split the patchset into two, the first one being preparatory patch
* Added link_state function pointer to the xgene_mac_ops structure
* Added xgene_indirect_ctl structure for indirect read/write arguments
v1:
* Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 14 Oct 2014 00:05:35 +0000 (17:05 -0700)]
drivers: net: xgene: Add SGMII based 1GbE ethtool support
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 14 Oct 2014 00:05:34 +0000 (17:05 -0700)]
drivers: net: xgene: Add SGMII based 1GbE support
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 14 Oct 2014 00:05:33 +0000 (17:05 -0700)]
drivers: net: xgene: Preparing for adding SGMII based 1GbE
- Added link_state function pointer to the xgene__mac_ops structure
- Moved ring manager (pdata->rm) assignment to xgene_enet_setup_ops
- Removed unused variable (pdata->phy_addr) and macro (FULL_DUPLEX)
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 14 Oct 2014 00:05:32 +0000 (17:05 -0700)]
dtb: Add SGMII based 1GbE node to APM X-Gene SoC device tree
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 14 Oct 2014 09:08:54 +0000 (02:08 -0700)]
net: filter: move common defines into bpf_common.h
userspace programs that use eBPF instruction macros need to include two files:
uapi/linux/filter.h and uapi/linux/bpf.h
Move common macro definitions that are shared between classic BPF and eBPF
into uapi/linux/bpf_common.h, so that user app can include only one bpf.h file
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 14 Oct 2014 17:01:14 +0000 (19:01 +0200)]
caif_usb: use target structure member in memset
parent cfusbl was used instead of first structure member 'layer'
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 14 Oct 2014 17:00:55 +0000 (19:00 +0200)]
caif_usb: remove redundant memory message
Let MM subsystem display out of memory messages.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Mon, 13 Oct 2014 20:21:46 +0000 (22:21 +0200)]
caif: replace kmalloc/memset 0 by kzalloc
Also add blank line after declaration
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Mon, 13 Oct 2014 16:51:07 +0000 (22:21 +0530)]
drivers: net: cpsw: remove child devices while driver detach
remove all the child devices from the system to make sure that re-insert of
cpsw module doesn't fail on child device populated by of_platform_populate().
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Mon, 13 Oct 2014 16:51:06 +0000 (22:21 +0530)]
drivers: net: davinci_cpdma: remove spinlock as SOFTIRQ-unsafe lock order detected
remove spinlock in cpdma_desc_pool_destroy() as there is no active cpdma
channel and iounmap should be called without auquiring lock.
root@dra7xx-evm:~# modprobe -r ti_cpsw
[ 50.539743]
[ 50.541312] ======================================================
[ 50.547796] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 50.554826]
3.14.19-02124-g95c5b7b #308 Not tainted
[ 50.559939] ------------------------------------------------------
[ 50.566416] modprobe/1921 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 50.573347] (vmap_area_lock){+.+...}, at: [<
c01127fc>] find_vmap_area+0x10/0x6c
[ 50.581132]
[ 50.581132] and this task is already holding:
[ 50.587249] (&(&pool->lock)->rlock#2){..-...}, at: [<
bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma]
[ 50.597766] which would create a new lock dependency:
[ 50.603048] (&(&pool->lock)->rlock#2){..-...} -> (vmap_area_lock){+.+...}
[ 50.610296]
[ 50.610296] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 50.618601] (&(&pool->lock)->rlock#2){..-...}
... which became SOFTIRQ-irq-safe at:
[ 50.626829] [<
c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[ 50.632677] [<
bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma]
[ 50.640437] [<
bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma]
[ 50.647289] [<
bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma]
[ 50.654512] [<
bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma]
[ 50.661455] [<
bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw]
[ 50.667038] [<
c05844f4>] net_rx_action+0xc0/0x1e8
[ 50.672150] [<
c0048234>] __do_softirq+0xcc/0x304
[ 50.677183] [<
c004873c>] irq_exit+0xa8/0xfc
[ 50.681751] [<
c000eeac>] handle_IRQ+0x50/0xb0
[ 50.686513] [<
c0008638>] gic_handle_irq+0x28/0x5c
[ 50.691628] [<
c06590a4>] __irq_svc+0x44/0x5c
[ 50.696289] [<
c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44
[ 50.702591] [<
c065a9c4>] do_page_fault.part.9+0x144/0x3c4
[ 50.708433] [<
c065acb8>] do_page_fault+0x74/0x84
[ 50.713453] [<
c00083dc>] do_DataAbort+0x34/0x98
[ 50.718391] [<
c065923c>] __dabt_usr+0x3c/0x40
[ 50.723148]
[ 50.723148] to a SOFTIRQ-irq-unsafe lock:
[ 50.728893] (vmap_area_lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
[ 50.736476] ... [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 50.741876] [<
c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[ 50.747908] [<
c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[ 50.754210] [<
c011486c>] get_vm_area_caller+0x3c/0x48
[ 50.759692] [<
c0114be0>] vmap+0x40/0x78
[ 50.763900] [<
c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[ 50.769835] [<
c093eac0>] start_kernel+0x320/0x388
[ 50.774952] [<
80008074>] 0x80008074
[ 50.778793]
[ 50.778793] other info that might help us debug this:
[ 50.778793]
[ 50.787181] Possible interrupt unsafe locking scenario:
[ 50.787181]
[ 50.794295] CPU0 CPU1
[ 50.799042] ---- ----
[ 50.803785] lock(vmap_area_lock);
[ 50.807446] local_irq_disable();
[ 50.813652] lock(&(&pool->lock)->rlock#2);
[ 50.820782] lock(vmap_area_lock);
[ 50.827086] <Interrupt>
[ 50.829823] lock(&(&pool->lock)->rlock#2);
[ 50.834490]
[ 50.834490] *** DEADLOCK ***
[ 50.834490]
[ 50.840695] 4 locks held by modprobe/1921:
[ 50.844981] #0: (&__lockdep_no_validate__){......}, at: [<
c03e53e8>] driver_detach+0x44/0xb8
[ 50.854038] #1: (&__lockdep_no_validate__){......}, at: [<
c03e53f4>] driver_detach+0x50/0xb8
[ 50.863102] #2: (&(&ctlr->lock)->rlock){......}, at: [<
bf017c34>] cpdma_ctlr_destroy+0x1c/0x114 [davinci_cpdma]
[ 50.873890] #3: (&(&pool->lock)->rlock#2){..-...}, at: [<
bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma]
[ 50.884871]
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[ 50.892827] -> (&(&pool->lock)->rlock#2){..-...} ops: 167 {
[ 50.898703] IN-SOFTIRQ-W at:
[ 50.901995] [<
c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[ 50.909476] [<
bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma]
[ 50.918878] [<
bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma]
[ 50.927366] [<
bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma]
[ 50.936218] [<
bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma]
[ 50.944794] [<
bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw]
[ 50.952009] [<
c05844f4>] net_rx_action+0xc0/0x1e8
[ 50.958765] [<
c0048234>] __do_softirq+0xcc/0x304
[ 50.965432] [<
c004873c>] irq_exit+0xa8/0xfc
[ 50.971635] [<
c000eeac>] handle_IRQ+0x50/0xb0
[ 50.978035] [<
c0008638>] gic_handle_irq+0x28/0x5c
[ 50.984788] [<
c06590a4>] __irq_svc+0x44/0x5c
[ 50.991085] [<
c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44
[ 50.999023] [<
c065a9c4>] do_page_fault.part.9+0x144/0x3c4
[ 51.006510] [<
c065acb8>] do_page_fault+0x74/0x84
[ 51.013171] [<
c00083dc>] do_DataAbort+0x34/0x98
[ 51.019738] [<
c065923c>] __dabt_usr+0x3c/0x40
[ 51.026129] INITIAL USE at:
[ 51.029335] [<
c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c
[ 51.036729] [<
bf017d78>] cpdma_chan_submit+0x4c/0x2f0 [davinci_cpdma]
[ 51.045225] [<
bf02863c>] cpsw_ndo_open+0x378/0x6bc [ti_cpsw]
[ 51.052897] [<
c058747c>] __dev_open+0x9c/0x104
[ 51.059287] [<
c05876ec>] __dev_change_flags+0x88/0x160
[ 51.066420] [<
c05877e4>] dev_change_flags+0x18/0x48
[ 51.073270] [<
c05ed51c>] devinet_ioctl+0x61c/0x6e0
[ 51.080029] [<
c056ee54>] sock_ioctl+0x5c/0x298
[ 51.086418] [<
c01350a4>] do_vfs_ioctl+0x78/0x61c
[ 51.092993] [<
c01356ac>] SyS_ioctl+0x64/0x74
[ 51.099200] [<
c000e580>] ret_fast_syscall+0x0/0x48
[ 51.105956] }
[ 51.107696] ... key at: [<
bf019000>] __key.21312+0x0/0xfffff650 [davinci_cpdma]
[ 51.115912] ... acquired at:
[ 51.119019] [<
c00899ac>] lock_acquire+0x9c/0x104
[ 51.124138] [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 51.129341] [<
c01127fc>] find_vmap_area+0x10/0x6c
[ 51.134547] [<
c0114960>] remove_vm_area+0x8/0x6c
[ 51.139659] [<
c0114a7c>] __vunmap+0x20/0xf8
[ 51.144318] [<
c001c350>] __arm_iounmap+0x10/0x18
[ 51.149440] [<
bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma]
[ 51.156560] [<
bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw]
[ 51.162407] [<
c03e62c8>] platform_drv_remove+0x18/0x1c
[ 51.168063] [<
c03e4c44>] __device_release_driver+0x70/0xc8
[ 51.174094] [<
c03e5458>] driver_detach+0xb4/0xb8
[ 51.179212] [<
c03e4a6c>] bus_remove_driver+0x4c/0x90
[ 51.184693] [<
c00b024c>] SyS_delete_module+0x10c/0x198
[ 51.190355] [<
c000e580>] ret_fast_syscall+0x0/0x48
[ 51.195661]
[ 51.197217]
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[ 51.205986] -> (vmap_area_lock){+.+...} ops: 520 {
[ 51.211032] HARDIRQ-ON-W at:
[ 51.214321] [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 51.221090] [<
c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[ 51.228750] [<
c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[ 51.236690] [<
c011486c>] get_vm_area_caller+0x3c/0x48
[ 51.243811] [<
c0114be0>] vmap+0x40/0x78
[ 51.249654] [<
c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[ 51.257239] [<
c093eac0>] start_kernel+0x320/0x388
[ 51.263994] [<
80008074>] 0x80008074
[ 51.269474] SOFTIRQ-ON-W at:
[ 51.272769] [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 51.279525] [<
c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[ 51.287190] [<
c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[ 51.295126] [<
c011486c>] get_vm_area_caller+0x3c/0x48
[ 51.302245] [<
c0114be0>] vmap+0x40/0x78
[ 51.308094] [<
c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[ 51.315669] [<
c093eac0>] start_kernel+0x320/0x388
[ 51.322423] [<
80008074>] 0x80008074
[ 51.327906] INITIAL USE at:
[ 51.331112] [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 51.337775] [<
c011376c>] alloc_vmap_area.isra.28+0xb8/0x300
[ 51.345352] [<
c0113a44>] __get_vm_area_node.isra.29+0x90/0x134
[ 51.353197] [<
c011486c>] get_vm_area_caller+0x3c/0x48
[ 51.360224] [<
c0114be0>] vmap+0x40/0x78
[ 51.365977] [<
c09442f0>] check_writebuffer_bugs+0x54/0x1a0
[ 51.373464] [<
c093eac0>] start_kernel+0x320/0x388
[ 51.380131] [<
80008074>] 0x80008074
[ 51.385517] }
[ 51.387260] ... key at: [<
c0a66948>] vmap_area_lock+0x10/0x20
[ 51.393841] ... acquired at:
[ 51.396945] [<
c00899ac>] lock_acquire+0x9c/0x104
[ 51.402060] [<
c06584e8>] _raw_spin_lock+0x28/0x38
[ 51.407266] [<
c01127fc>] find_vmap_area+0x10/0x6c
[ 51.412478] [<
c0114960>] remove_vm_area+0x8/0x6c
[ 51.417592] [<
c0114a7c>] __vunmap+0x20/0xf8
[ 51.422252] [<
c001c350>] __arm_iounmap+0x10/0x18
[ 51.427369] [<
bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma]
[ 51.434487] [<
bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw]
[ 51.440336] [<
c03e62c8>] platform_drv_remove+0x18/0x1c
[ 51.446000] [<
c03e4c44>] __device_release_driver+0x70/0xc8
[ 51.452031] [<
c03e5458>] driver_detach+0xb4/0xb8
[ 51.457147] [<
c03e4a6c>] bus_remove_driver+0x4c/0x90
[ 51.462628] [<
c00b024c>] SyS_delete_module+0x10c/0x198
[ 51.468289] [<
c000e580>] ret_fast_syscall+0x0/0x48
[ 51.473584]
[ 51.475140]
[ 51.475140] stack backtrace:
[ 51.479703] CPU: 0 PID: 1921 Comm: modprobe Not tainted
3.14.19-02124-g95c5b7b #308
[ 51.487744] [<
c0016090>] (unwind_backtrace) from [<
c0012060>] (show_stack+0x10/0x14)
[ 51.495865] [<
c0012060>] (show_stack) from [<
c0652a20>] (dump_stack+0x78/0x94)
[ 51.503444] [<
c0652a20>] (dump_stack) from [<
c0086f18>] (check_usage+0x408/0x594)
[ 51.511293] [<
c0086f18>] (check_usage) from [<
c00870f8>] (check_irq_usage+0x54/0xb0)
[ 51.519416] [<
c00870f8>] (check_irq_usage) from [<
c0088724>] (__lock_acquire+0xe54/0x1b90)
[ 51.528077] [<
c0088724>] (__lock_acquire) from [<
c00899ac>] (lock_acquire+0x9c/0x104)
[ 51.536291] [<
c00899ac>] (lock_acquire) from [<
c06584e8>] (_raw_spin_lock+0x28/0x38)
[ 51.544417] [<
c06584e8>] (_raw_spin_lock) from [<
c01127fc>] (find_vmap_area+0x10/0x6c)
[ 51.552726] [<
c01127fc>] (find_vmap_area) from [<
c0114960>] (remove_vm_area+0x8/0x6c)
[ 51.560935] [<
c0114960>] (remove_vm_area) from [<
c0114a7c>] (__vunmap+0x20/0xf8)
[ 51.568693] [<
c0114a7c>] (__vunmap) from [<
c001c350>] (__arm_iounmap+0x10/0x18)
[ 51.576362] [<
c001c350>] (__arm_iounmap) from [<
bf017d08>] (cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma])
[ 51.586494] [<
bf017d08>] (cpdma_ctlr_destroy [davinci_cpdma]) from [<
bf026294>] (cpsw_remove+0x48/0x8c [ti_cpsw])
[ 51.597261] [<
bf026294>] (cpsw_remove [ti_cpsw]) from [<
c03e62c8>] (platform_drv_remove+0x18/0x1c)
[ 51.606659] [<
c03e62c8>] (platform_drv_remove) from [<
c03e4c44>] (__device_release_driver+0x70/0xc8)
[ 51.616237] [<
c03e4c44>] (__device_release_driver) from [<
c03e5458>] (driver_detach+0xb4/0xb8)
[ 51.625264] [<
c03e5458>] (driver_detach) from [<
c03e4a6c>] (bus_remove_driver+0x4c/0x90)
[ 51.633749] [<
c03e4a6c>] (bus_remove_driver) from [<
c00b024c>] (SyS_delete_module+0x10c/0x198)
[ 51.642781] [<
c00b024c>] (SyS_delete_module) from [<
c000e580>] (ret_fast_syscall+0x0/0x48)
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Mon, 13 Oct 2014 16:51:05 +0000 (22:21 +0530)]
drivers: net: davinci_cpdma: remove kfree on objects allocated with devm_* apis
memories allocated with devm_* apis must not be freed with kfree apis,
so removing the kfree calls
Fixes:
e194312854ed ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().')
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 13 Oct 2014 16:21:42 +0000 (09:21 -0700)]
tg3: Add skb->xmit_more support
Ring TX doorbell only if xmit_more is not set or the queue is stopped.
Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 13 Oct 2014 14:34:10 +0000 (16:34 +0200)]
ipv4: fix nexthop attlen check in fib_nh_match
fib_nh_match does not match nexthops correctly. Example:
ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
nexthop via 192.168.122.15 dev eth0
Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process
Please consider this for stable trees as well.
Fixes:
4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 11 Oct 2014 22:17:29 +0000 (15:17 -0700)]
tcp: fix tcp_ack() performance problem
We worked hard to improve tcp_ack() performance, by not accessing
skb_shinfo() in fast path (
cd7d8498c9a5 tcp: change tcp_skb_pcount()
location)
We still have one spurious access because of ACK timestamping,
added in commit
e1c8a607b281 ("net-timestamp: ACK timestamp for
bytestreams")
By checking if sk_tsflags has SOF_TIMESTAMPING_TX_ACK set,
we can avoid two cache line misses for the common case.
While we are at it, add two prefetchw() :
One in tcp_ack() to bring skb at the head of write queue.
One in tcp_clean_rtx_queue() loop to bring following skb,
as we will delete skb from the write queue and dirty skb->next->prev.
Add a couple of [un]likely() clauses.
After this patch, tcp_ack() is no longer the most consuming
function in tcp stack.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 19:05:39 +0000 (15:05 -0400)]
Merge branch 'isdn'
Tilman Schmidt says:
====================
Coverity patches for drivers/isdn
Here's a series of patches for the ISDN CAPI subsystem and the
Gigaset ISDN driver.
Patches 1 to 7 are specific fixes for Coverity warnings.
Patches 8 to 11 fix related problems with the handling of invalid
CAPI command codes I noticed while working on this.
Patch 12 fixes an unrelated problem I noticed during the subsequent
regression tests.
It would be great if these could still be merged.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/gigaset: fix usb_gigaset write_cmd result race
In usb_gigaset function gigaset_write_cmd(), the length field of
the command buffer structure could be cleared by the transmit
tasklet before it was used for the function's return value.
Fix by copying to a local variable before scheduling the tasklet.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: don't return NULL from capi_cmd2str()
capi_cmd2str() is used in many places to build log messages.
None of them is prepared to handle NULL as a result.
Change the function to return printable string "INVALID_COMMAND"
instead.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: handle CAPI 2.0 message parser failures
Have callers of capi_cmsg2message and capi_message2cmsg handle
non-zero return values indicating failure.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: prevent NULL pointer dereference on invalid CAPI command
An invalid CAPI 2.0 command/subcommand combination may retrieve a
NULL pointer from the cpars[] array which will later be dereferenced
by the parser routines.
Fix by adding NULL pointer checks in strategic places.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: refactor command/subcommand table accesses
Encapsulate accesses to the CAPI 2.0 command/subcommand name and
parameter tables in a single place in preparation for redesign.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: prevent index overrun from command_2_index()
The result of the function command_2_index() is used to index two
arrays mnames[] and cpars[] with max. index 0x4e but in its current
form that function can produce results up to 3*(0x9+0x9)+0x7f =
0xb5.
Fix by clamping all result values potentially overrunning the arrays
to zero which is already handled as an invalid value.
Re-spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/capi: correct capi20_manufacturer argument type mismatch
Function capi20_manufacturer() is declared with unsigned int cmd
argument but called with unsigned long.
Fix by correcting the function prototype since the actual argument
is part of the user visible API.
Spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:30 +0000 (13:46 +0200)]
isdn/gigaset: fix non-heap pointer deallocation
at_state structures may be allocated individually or as part of a
cardstate or bc_state structure. The disconnect() function handled
both cases, creating a risk that it might try to deallocate an
at_state structure that had not been allocated individually.
Fix by splitting disconnect() into two variants handling cases
with and without an associated B channel separately, and adding
an explicit check.
Spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:29 +0000 (13:46 +0200)]
isdn/gigaset: fix NULL pointer dereference
In do_action, a NULL pointer might be passed to function start_dial
which will dereference it.
Fix by adding a check for NULL before the call.
Spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:29 +0000 (13:46 +0200)]
isdn/gigaset: limit raw CAPI message dump length
In dump_rawmsg, the length field from a received data package was
used unscrutinized, allowing an attacker to control the size of the
allocated buffer and the number of times the output loop iterates.
Fix by limiting to a reasonable value.
Spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:29 +0000 (13:46 +0200)]
isdn/gigaset: make sure controller name is null terminated
In gigaset_isdn_regdev, the name field may not have a null terminator
if the source string's length is equal to the buffer size.
Fix by zero filling the structure and excluding the last byte of the
name field from the copy.
Spotted with Coverity.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 11 Oct 2014 11:46:29 +0000 (13:46 +0200)]
isdn/gigaset: missing break in do_facility_req
If we take the unsupported supplementary service notification mask
path, we end up falling through and overwriting the error code.
Insert a break statement to skip the remainder of the switch case
and proceed to sending the reply message.
Spotted with Coverity.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 18:45:17 +0000 (14:45 -0400)]
Merge branch 'fec-ptp'
Luwei Zhou says:
====================
Enable FEC pps feather
Change from v2 to v3:
-Using the default channel 0 to be PPS channel not PTP_PIN_SET/GETFUNC interface.
-Using the linux definition of NSEC_PER_SEC.
Change from v1 to v2:
- Fix the potential 32-bit multiplication overflow issue.
- Optimize the hareware adjustment code to improve efficiency as Richard suggested
- Use ptp PTP_PIN_SET/GETFUNC interface to set PPS channel not device tree
and add PTP_PF_PPS enumeration
- Modify comments style
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luwei Zhou [Fri, 10 Oct 2014 05:15:30 +0000 (13:15 +0800)]
net: fec: ptp: Enable PPS output based on ptp clock
FEC ptp timer has 4 channel compare/trigger function. It can be used to
enable pps output.
The pulse would be ouput high exactly on N second. The pulse ouput high
on compare event mode is used to produce pulse per second. The pulse
width would be one cycle based on ptp timer clock source.Since 31-bit
ptp hardware timer is used, the timer will wrap more than 2 seconds. We
need to reload the compare compare event about every 1 second.
Signed-off-by: Luwei Zhou <b45643@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luwei Zhou [Fri, 10 Oct 2014 05:15:29 +0000 (13:15 +0800)]
net: fec: ptp: Use hardware algorithm to adjust PTP counter.
The FEC IP supports hardware adjustment for ptp timer. Refer to the description of
ENET_ATCOR and ENET_ATINC registers in the spec about the hardware adjustment. This
patch uses hardware support to adjust the ptp offset and frequency on the slave side.
Signed-off-by: Luwei Zhou <b45643@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Fugang Duan <b38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luwei Zhou [Fri, 10 Oct 2014 05:15:28 +0000 (13:15 +0800)]
net: fec: ptp: Use the 31-bit ptp timer.
When ptp switches from software adjustment to hardware ajustment, linux ptp can't converge.
It is caused by the IP limit. Hardware adjustment logcial have issue when ptp counter
runs over 0x80000000(31 bit counter). The internal IP reference manual already remove 32bit
free-running count support. This patch replace the 32-bit PTP timer with 31-bit.
Signed-off-by: Luwei Zhou <b45643@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Sat, 11 Oct 2014 05:03:34 +0000 (13:03 +0800)]
ipv6: remove aca_lock spinlock from struct ifacaddr6
no user uses this lock.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Sat, 11 Oct 2014 03:30:23 +0000 (20:30 -0700)]
x86: bpf_jit: fix two bugs in eBPF JIT compiler
1.
JIT compiler using multi-pass approach to converge to final image size,
since x86 instructions are variable length. It starts with large
gaps between instructions (so some jumps may use imm32 instead of imm8)
and iterates until total program size is the same as in previous pass.
This algorithm works only if program size is strictly decreasing.
Programs that use LD_ABS insn need additional code in prologue, but it
was not emitted during 1st pass, so there was a chance that 2nd pass would
adjust imm32->imm8 jump offsets to the same number of bytes as increase in
prologue, which may cause algorithm to erroneously decide that size converged.
Fix it by always emitting largest prologue in the first pass which
is detected by oldproglen==0 check.
Also change error check condition 'proglen != oldproglen' to fail gracefully.
2.
while staring at the code realized that 64-byte buffer may not be enough
when 1st insn is large, so increase it to 128 to avoid buffer overflow
(theoretical maximum size of prologue+div is 109) and add runtime check.
Fixes:
622582786c9e ("net: filter: x86: internal BPF JIT")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 11 Oct 2014 01:06:35 +0000 (18:06 -0700)]
tcp: fix ooo_okay setting vs Small Queues
TCP Small Queues (tcp_tsq_handler()) can hold one reference on
sk->sk_wmem_alloc, preventing skb->ooo_okay being set.
We should relax test done to set skb->ooo_okay to take care
of this extra reference.
Minimal truesize of skb containing one byte of payload is
SKB_TRUESIZE(1)
Without this fix, we have more chance locking flows into the wrong
transmit queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 10 Oct 2014 21:10:47 +0000 (23:10 +0200)]
skbuff: fix ftrace handling in skb_unshare
If the skb is not dropped afterwards we should run consume_skb instead
kfree_skb. Inside of function skb_unshare we do always a kfree_skb,
doesn't depend if skb_copy failed or was successful.
This patch switch this behaviour like skb_share_check, if allocation of
sk_buff failed we use kfree_skb otherwise consume_skb.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 10 Oct 2014 21:30:52 +0000 (14:30 -0700)]
fm10k: Add skb->xmit_more support
This change adds support for skb->xmit_more based on the changes that were
made to igb to support the feature. The main changes are moving up the
check for maybe_stop_tx so that we can check netif_xmit_stopped to determine
if we must write the tail because we can add no further buffers.
Acked-by: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nimrod Andy [Mon, 13 Oct 2014 02:53:48 +0000 (10:53 +0800)]
net: fec: Fix sparse warnings with different lock contexts for basic block
reproduce:
make ARCH=arm C=1 2>fec.txt drivers/net/ethernet/freescale/fec_main.o
cat fec.txt
sparse warnings:
drivers/net/ethernet/freescale/fec_main.c:2916:12: warning: context imbalance
in 'fec_set_features' - different lock contexts for basic block
Christopher Li suggest to change as below:
if (need_lock) {
lock();
do_something_real();
unlock();
} else {
do_something_real();
}
Reported-by: Fabio Estevam <festevam@gmail.com>
Suggested-by: Christopher Li <sparse@chrisli.org>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 9 Oct 2014 15:08:42 +0000 (10:08 -0500)]
MAINTAINERS: Update contact information for Vince Bridgers
Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 16:46:29 +0000 (12:46 -0400)]
Merge branch 'sctp'
Daniel Borkmann says:
====================
Here are some SCTP fixes.
[ Note, immediate workaround would be to disable ASCONF (it
is sysctl disabled by default). It is actually only used
together with chunk authentication. ]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:33 +0000 (22:55 +0200)]
net: sctp: fix remote memory pressure from excessive queueing
This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
[...]
---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>
... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.
We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].
The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.
In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.
One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.
Joint work with Vlad Yasevich.
Fixes:
2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:32 +0000 (22:55 +0200)]
net: sctp: fix panic on duplicate ASCONF chunks
When receiving a e.g. semi-good formed connection scan in the
form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---------------- ASCONF_a; ASCONF_b ----------------->
... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!
The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.
Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():
While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit
2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.
Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.
Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.
Joint work with Vlad Yasevich.
Fixes:
2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 9 Oct 2014 20:55:31 +0000 (22:55 +0200)]
net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
Commit
6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
skb_over_panic: text:
ffffffffa01ea1c3 len:31056 put:30768
head:
ffff88011bd81800 data:
ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<
ffffffff8144fb1c>] skb_put+0x5c/0x70
[<
ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<
ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<
ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<
ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<
ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<
ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<
ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<
ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<
ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<
ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<
ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<
ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<
ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<
ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<
ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<
ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<
ffffffff81496ac5>] ip_rcv+0x275/0x350
[<
ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<
ffffffff81460588>] netif_receive_skb+0x58/0x60
This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>
... where ASCONF chunk of length 280 contains 2 parameters ...
1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)
... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.
The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.
In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.
When walking to the next parameter this time, we proceed
with ...
length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;
... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.
Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.
Joint work with Vlad Yasevich.
Fixes:
b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruno Thomsen [Thu, 9 Oct 2014 14:48:14 +0000 (16:48 +0200)]
phy/micrel: KSZ8031RNL RMII clock reconfiguration bug
Bug: Unable to send and receive Ethernet packets with Micrel PHY.
Affected devices:
KSZ8031RNL (commercial temp)
KSZ8031RNLI (industrial temp)
Description:
PHY device is correctly detected during probe.
PHY power-up default is 25MHz crystal clock input
and output 50MHz RMII clock to MAC.
Reconfiguration of PHY to input 50MHz RMII clock from MAC
causes PHY to become unresponsive if clock source is changed
after Operation Mode Strap Override (OMSO) register setup.
Cause:
Long lead times on parts where clock setup match circuit design
forces the usage of similar parts with wrong default setup.
Solution:
Swapped KSZ8031 register setup and added phy_write return code validation.
Tested with Freescale i.MX28 Fast Ethernet Controler (fec).
Signed-off-by: Bruno Thomsen <bth@kamstrup.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 19:39:22 +0000 (15:39 -0400)]
Merge branch 'bcmgenet_systemport'
Florian Fainelli says:
====================
net: bcmgenet & systemport fixes
This patch series fixes an off-by-one error introduced during a previous
change, and the two other fixes fix a wake depth imbalance situation for
the Wake-on-LAN interrupt line.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Oct 2014 17:51:54 +0000 (10:51 -0700)]
net: systemport: avoid unbalanced enable_irq_wake calls
Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:
1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt
After step 4), SYSTEMPORT would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Oct 2014 17:51:53 +0000 (10:51 -0700)]
net: bcmgenet: avoid unbalanced enable_irq_wake calls
Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:
1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt
After step 4), GENET would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Oct 2014 17:51:52 +0000 (10:51 -0700)]
net: bcmgenet: fix off-by-one in incrementing read pointer
Commit
b629be5c8399d7c423b92135eb43a86c924d1cbc ("net: bcmgenet: check
harder for out of memory conditions") moved the increment of the local
read pointer *before* reading from the hardware descriptor using
dmadesc_get_length_status(), which creates an off-by-one situation.
Fix this by moving again the read_ptr increment after we have read the
hardware descriptor to get both the control block and the read pointer
back in sync.
Fixes:
b629be5c8399 ("net: bcmgenet: check harder for out of memory conditions")
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 19:37:36 +0000 (15:37 -0400)]
Merge branch 'net-drivers-pgcnt'
Eric Dumazet says:
====================
net: fix races accessing page->_count
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
The only case it is valid is when page->_count is 0, we can use this in
__netdev_alloc_frag()
Note that I never seen crashes caused by these races, the issue was reported
by Andres Lagar-Cavilla and Hugh Dickins.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Oct 2014 11:48:18 +0000 (04:48 -0700)]
net: fix races in page->_count manipulation
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
The only case it is valid is when page->_count is 0
Fixes:
540eb7bf0bbed ("net: Update alloc frag to reduce get/put page usage and recycle pages")
Signed-off-by: Eric Dumaze <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Oct 2014 11:48:17 +0000 (04:48 -0700)]
mlx4: fix race accessing page->_count
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Oct 2014 11:48:16 +0000 (04:48 -0700)]
ixgbe: fix race accessing page->_count
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Oct 2014 11:48:15 +0000 (04:48 -0700)]
igb: fix race accessing page->_count
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Oct 2014 11:48:14 +0000 (04:48 -0700)]
fm10k: fix race accessing page->_count
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Fri, 10 Oct 2014 07:48:05 +0000 (09:48 +0200)]
net/phy: micrel: Add clock support for KSZ8021/KSZ8031
The KSZ8021 and KSZ8031 support RMII reference input clocks of 25MHz
and 50MHz. Both PHYs differ in the default frequency they expect
after reset. If this differs from the actual input clock, then
register 0x1f bit 7 must be changed.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 10 Oct 2014 19:09:12 +0000 (12:09 -0700)]
flow-dissector: Fix alignment issue in __skb_flow_get_ports
This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter. Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.
In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present. In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 10 Oct 2014 05:56:51 +0000 (13:56 +0800)]
net: filter: fix the comments
1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
2. Remove wrong comments about storing intermediate value.
3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
comments
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 10 Oct 2014 03:36:54 +0000 (11:36 +0800)]
Documentation: replace __sk_run_filter with __bpf_prog_run
__sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 19:09:51 +0000 (15:09 -0400)]
Merge branch 'macvlan'
Jason Baron says:
====================
macvlan: optimize receive path
So after porting this optimization to net-next, I found that the netperf
results of TCP_RR regress right at the maximum peak of transactions/sec. That
is as I increase the number of threads via the first argument to super_netperf,
the number of transactions/sec keep increasing, peak, and then start
decreasing. It is right at the peak, that I see a small regression with this
patch (see results in patch 2/2).
Without the patch, the ksoftirqd threads are the top cpu consumers threads on
the system, since the extra 'netif_rx()', is queuing more softirq work, whereas
with the patch, the ksoftirqd threads are below all of the 'netserver' threads
in terms of their cpu usage. So there appears to be some interaction between how
softirqs are serviced at the peak here and this patch. I think the test results
are still supportive of this approach, but I wanted to be clear on my findings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
jbaron@akamai.com [Fri, 10 Oct 2014 03:13:31 +0000 (03:13 +0000)]
macvlan: optimize the receive path
The netif_rx() call on the fast path of macvlan_handle_frame() appears to
be there to ensure that we properly throttle incoming packets. However, it
would appear as though the proper throttling is already in place for all
possible ingress paths, and that the call is redundant. If packets are arriving
from the physical NIC, we've already throttled them by this point. Otherwise,
if they are coming via macvlan_queue_xmit(), it calls either
'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in
the broadcast case, we are throttling via macvlan_broadcast_enqueue().
The test results below are from off the box to an lxc instance running macvlan.
Once the tranactions/sec stop increasing, the cpu idle time has gone to 0.
Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card.
for i in {10,100,200,300,400,500};
do super_netperf $i -H $ip -t TCP_RR; done
Average of 5 runs.
trans/sec trans/sec
(3.17-rc7-net-next) (3.17-rc7-net-next + this patch)
---------- ----------
208101 211534 (+1.6%)
839493 850162 (+1.3%)
845071 844053 (-.12%)
816330 819623 (+.4%)
778700 789938 (+1.4%)
735984 754408 (+2.5%)
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
jbaron@akamai.com [Fri, 10 Oct 2014 03:13:27 +0000 (03:13 +0000)]
macvlan: pass 'bool' type to macvlan_count_rx()
Pass last argument to macvlan_count_rx() as the correct bool type.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 19:07:02 +0000 (15:07 -0400)]
Merge branch 'xgene'
Iyappan Subramanian says:
====================
Add 10GbE support to APM X-Gene SoC ethernet driver
Adding 10GbE support to APM X-Gene SoC ethernet driver.
v4: Address comments from v3
* dtb: resolved merge conflict for the net tree
v3: Address comments from v2
* dtb: changed to use all-zeros for the mac address
v2: Address comments from v1
* created preparatory patch to review before adding new functionality
* dtb: updated to use tabs consistently
v1:
* Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:07 +0000 (18:32 -0700)]
drivers: net: xgene: Add 10GbE ethtool support
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:06 +0000 (18:32 -0700)]
drivers: net: xgene: Add 10GbE support
- Added 10GbE support
- Removed unused macros/variables
- Moved mac_init call to the end of hardware init
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:05 +0000 (18:32 -0700)]
drivers: net: xgene: Preparing for adding 10GbE support
- Rearranged code to pave the way for adding 10GbE support
- Added mac_ops structure containing function pointers for mac specific functions
- Added port_ops structure containing function pointers for port specific functions
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:04 +0000 (18:32 -0700)]
dtb: Add 10GbE node to APM X-Gene SoC device tree
Added 10GbE interface and clock nodes.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:03 +0000 (18:32 -0700)]
Documentation: dts: Update section header for APM X-Gene
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Fri, 10 Oct 2014 01:32:02 +0000 (18:32 -0700)]
MAINTAINERS: Update APM X-Gene section
Updated APM X-Gene ethernet driver maintainers list.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 9 Oct 2014 22:16:41 +0000 (15:16 -0700)]
net: bpf: fix bpf syscall dependence on anon_inodes
minimal configurations where EPOLL, PERF_EVENTS, etc are disabled,
but NET is enabled, are failing to build with link error:
kernel/built-in.o: In function `bpf_prog_load':
syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd'
fix it by selecting ANON_INODES when NET is enabled
Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 19:01:09 +0000 (15:01 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter fixes for net-next
This batch contains two fixes for what you have in your net-next,
they are:
1) Remove nf_send_reset6() from header file. This function now resides
in the nf_reject_ipv6 module. Reported by Eric Dumazet.
2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
errors reported by Dan Carpenter's static analysis tools.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Oct 2014 18:49:55 +0000 (14:49 -0400)]
Merge tag 'master-2014-10-08' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-10-09
Please pull this batch of fixes intended for the 3.18 stream!
Andrea Merello makes rtl818x_pci use a more reasonable transmission
rate for HW generated frames.
Fabian Frederick tweaks some kernel-doc bits to avoid warnings.
Larry Finger corrects a possible unaligned access in the rtlwifi code.
Marek Puzyniak avoids a kernel panic in ath9k_hw_reset.
Sujith Manoharan goes for the hat trick -- he fixes a smatch warning
in the shared ath code, he fixes a crash in ath9k, and he corrects
a sequence number assignment problem in ath9k too.
For ease of merging, I pulled the last bits of the wireless tree as well...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vince Bridgers [Thu, 9 Oct 2014 15:10:36 +0000 (10:10 -0500)]
stmmac: correct mc_filter local variable in set_filter and set_mac_addr call
Testing revealed that the local variable mc_filter was dimensioned
incorrectly for all possible configurations and get_mac_addr should
have been set_mac_addr (a typo). Make sure mc_filter is dimensioned
to 8 32-bit unsigned longs - the largest size of the Synopsys
multicast filter register set.
Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Thu, 9 Oct 2014 14:15:42 +0000 (16:15 +0200)]
net: pxa168_eth: PXA168_ETH should depend on HAS_DMA
If NO_DMA=y:
drivers/built-in.o: In function `rxq_deinit':
pxa168_eth.c:(.text+0x2a2f2e): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `txq_reclaim':
pxa168_eth.c:(.text+0x2a3044): undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `txq_deinit':
pxa168_eth.c:(.text+0x2a310a): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `txq_init':
pxa168_eth.c:(.text+0x2a3226): undefined reference to `dma_alloc_coherent'
drivers/built-in.o: In function `rxq_init':
pxa168_eth.c:(.text+0x2a32d4): undefined reference to `dma_alloc_coherent'
drivers/built-in.o: In function `init_hash_table':
pxa168_eth.c:(.text+0x2a3354): undefined reference to `dma_alloc_coherent'
drivers/built-in.o: In function `rxq_refill':
pxa168_eth.c:(.text+0x2a345a): undefined reference to `dma_map_single'
drivers/built-in.o: In function `rxq_process':
pxa168_eth.c:(.text+0x2a39cc): undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `pxa168_eth_remove':
pxa168_eth.c:(.text+0x2a3b84): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `pxa168_eth_start_xmit':
pxa168_eth.c:(.text+0x2a3e8a): undefined reference to `dma_map_single'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pranith Kumar [Fri, 10 Oct 2014 05:19:06 +0000 (01:19 -0400)]
networking: fm10k: Fix build failure
The latest linus git tip (3.18-rc1) fails with the following build failure. Fix
this by making PTP support explicit for fm10k driver.
rivers/built-in.o: In function `fm10k_ptp_register':
(.text+0x12e760): undefined reference to `ptp_clock_registER'
drivers/built-in.o: In function `fm10k_ptp_unregister':
(.text+0x12e7dc): undefined reference to `ptp_clock_unregister'
Makefile:930: recipe for target 'vmlinux' failed
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LEROY Christophe [Thu, 9 Oct 2014 14:54:43 +0000 (16:54 +0200)]
net: fs_enet: error: 'SCCE_ENET_TXF' undeclared
[linux-devel:devel-hourly-
2014100909 3763/3915] drivers/net/ethernet/freescale/fs_enet/mac-scc.c:119:32: error: 'SCCE_ENET_TXF' undeclared
Due to patch
d43a396 net: fs_enet: Add NAPI TX, it appears that some target
compilations are broken.
This is due to the fact that unlike the FEC, the SCC and FCC don't have a TXF
event (complete Frame transmitted) but only TXB (buffer transmitted).
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Thu, 9 Oct 2014 10:18:10 +0000 (12:18 +0200)]
net_sched: restore qdisc quota fairness limits after bulk dequeue
Restore the quota fairness between qdisc's, that we broke with commit
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").
Before that commit, the quota in __qdisc_run() were in packets as
dequeue_skb() would only dequeue a single packet, that assumption
broke with bulk dequeue.
We choose not to account for the number of packets inside the TSO/GSO
packets (accessable via "skb_gso_segs"). As the previous fairness
also had this "defect". Thus, GSO/TSO packets counts as a single
packet.
Further more, we choose to slack on accuracy, by allowing a bulk
dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
limited by the BQL bytelimit. This is done because BQL prefers to get
its full budget for appropriate feedback from TX completion.
In future, we might consider reworking this further and, if it allows,
switch to a time-based model, as suggested by Eric. Right now, we only
restore old semantics.
Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the
first patch in cooperation with Daniel and Jesper. Eric rewrote the
patch.
Fixes:
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Oct 2014 23:06:15 +0000 (19:06 -0400)]
Merge branch 'r8152'
Hayes Wang says:
====================
r8152: use mutex for hw settings
v2:
Make sure the autoresume wouldn't occur inside the mutex, otherwise
the dead lock would happen. For the purpose, adjust some code about
autosuspend/autoresume.
v1:
Use mutex to avoid that the serial hw settings would be interrupted
by other settings. Although there is no problem now, it makes the
driver more safe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 9 Oct 2014 10:00:26 +0000 (18:00 +0800)]
r8152: add mutex for hw settings
Use the mutex to avoid the settings are interrupted by other ones.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 9 Oct 2014 10:00:25 +0000 (18:00 +0800)]
r8152: adjust usb_autopm_xxx
Add usb_autopm_xxx for rtl8152_get_settings() ,and remove
usb_autopm_xxx from read_mii_word() and write_mii_word().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 9 Oct 2014 10:00:24 +0000 (18:00 +0800)]
r8152: autoresume before setting feature
Resume the device before setting the feature.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masanari Iida [Thu, 9 Oct 2014 03:58:08 +0000 (12:58 +0900)]
net: Missing @ before descriptions cause make xmldocs warning
This patch fix following warning.
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask'
Acutually the descriptions exist, but missing "@" in front.
This problem start to happen when following commit was merged
into Linus's tree during 3.18-rc1 merge period.
commit
2e4e44107176d552f8bb1bb76053e850e3809841
net: add alloc_skb_with_frags() helper
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Oct 2014 22:53:59 +0000 (18:53 -0400)]
Merge branch 'cxgb4'
Hariprasad Shenai says:
====================
cxgb4/cxgb4vf: Misc fixes and 40G support for cxgb4vf
This patch series adds 40G support for cxgb4vf driver. Update the LSO length for
cxgb4vf, fix macro. Wait for device to get ready before reading PL_WHOAMI
register.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and cxgb4vf driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 9 Oct 2014 00:18:47 +0000 (05:48 +0530)]
cxgb4: Wait for device to get ready before reading any register
Call t4_wait_dev_ready() before attempting to read the PL_WHOAMI register
(to determine which function we have been attached to). This prevents us from
failing on that read if it comes right after a RESET.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 9 Oct 2014 00:18:46 +0000 (05:48 +0530)]
cxgb4vf: Add 40G support for cxgb4vf driver
Add 40G support for cxgb4vf driver. ethtool speed values are just numbers of
megabits and there is no SPEED_40000 in ethtool speed values. To be consistent,
use integer constants directly for all speeds.
Use is_x_10g_port()("is 10Gb/s or higher") in cfg_queues() instead of
is_10g_port() ("is exactly 10Gb/s"). Else we will end up using a single
"Queue Set" on 40Gb/s adapters.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 9 Oct 2014 00:18:45 +0000 (05:48 +0530)]
cxgb4/cxgb4vf: Updated the LSO transfer length in CPL_TX_PKT_LSO for T5
Update the lso length for T5 adapter and fix PIDX_T5 macro
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Oct 2014 05:40:56 +0000 (01:40 -0400)]
Merge branch 'gianfar'
Claudiu Manoil says:
====================
gianfar: ARM port driver updates (1/2)
This is the first round of driver portability fixes and clean-up
with the main purpose to make gianfar portable on ARM, for the ARM
based SoC that integrates the eTSEC ethernet controller - "ls1021a".
The patches primarily address compile time errors, when compiling
gianfar on ARM. They replace PPC specific functions and macros
with architecture independent ones, solve arch specific header
inclusions, guard code that relates to PPC only, and even address
some simple endianess issues (see MAC address setup patch).
The patches addressing the bulk of remaining endianess issues,
like handling DMA fields (BD and FCB), will follow with the second
round.
====================
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Tue, 7 Oct 2014 07:44:35 +0000 (10:44 +0300)]
gianfar: Replace eieio with wmb for non-PPC archs
Replace PPC specific eieio() with arch independent wmb()
for other architectures, i.e. ARM.
The eieio() macro is not defined on ARM and generates
build error.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>