Rafał Miłecki [Sun, 20 Apr 2014 11:05:43 +0000 (13:05 +0200)]
b43: N-PHY: drop second noise variance table
New Broadcom drivers don't upload it anymore. It was probably a copy & paste
mistake in early N-PHY rev 3+ days.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Rafał Miłecki [Sat, 19 Apr 2014 21:10:05 +0000 (23:10 +0200)]
b43: G-PHY: fix random mistakes to match specs
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 18:47:03 +0000 (11:47 -0700)]
mwifiex: enable aggregation for TID 6 and 7 streams
Currently AMSDU and AMPDU aggregation is enabled for TID 0 to
TID 5 streams. Lets enable it for remaining two streams also.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 18:47:02 +0000 (11:47 -0700)]
mwifiex: increase tx/rx AMPDU window sizes for STA 11ac mode
This will help to aggregate more packets which yields better
throughput results for 11ac chipsets.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 18:47:01 +0000 (11:47 -0700)]
mwifiex: increase tx/rx AMPDU window sizes for STA 11n mode
This will help to aggregate more packets which yields better
throughput results.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 18:47:00 +0000 (11:47 -0700)]
mwifiex: add firmware dump feature for PCIe
Firmware dump feature is added for PCIe based chipsets.
Separate file will be created at /var/log/fw_dump_*
for each memory segment.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 18:46:59 +0000 (11:46 -0700)]
mwifiex: add fw_dump debugfs file
This option be useful to dump firmware memory for debugging
purpose. Actual code to dump firmware momory for SDIO and PCIe
chipsets will be added later.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stanislaw Gruszka [Thu, 17 Apr 2014 09:08:48 +0000 (11:08 +0200)]
rt2x00: restore original beaconing state
After changing local per interface beacon setting restore original
global beaconing state.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 05:01:54 +0000 (22:01 -0700)]
mwifiex: don't clear cmd_sent flag in timeout handler
When command timeout occurs due to a firmware/hardware bug,
there is no chance of next command being successful. We will
keep cmd_sent flag on so that next command won't be sent to
firmware.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Thu, 17 Apr 2014 05:01:53 +0000 (22:01 -0700)]
mwifiex: fix IE parsing issues
IE's are parsed from beacon buffer and stored locally using
mwifiex_update_bss_desc_with_ie() function.
Sometimes the local pointers point to the data inside IE, but
while using them it is assumed that they are pointing to the IE
itself.
These issues are fixed in this patch.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:32:55 +0000 (15:32 -0700)]
mwifiex: remove redundant 'fw_load' completion structure
'add_remove_card_sem' semaphore already takes care of
synchronization for driver load and unload threads.
Hence there won't be a case when unload thread is waiting on
'wait_for_completion(fw_load)'.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:32:54 +0000 (15:32 -0700)]
mwifiex: use USB core's soft_unbind option
This option allows driver to finish pending operations in
disconnect handler by not killing URBs after usb_deregister
call.
We will get rid of global pointer 'usb_card' by moving code
from cleanup_module() to disconnect(). This will help to match
with our handling for SDIO and PCIe interfaces.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Avinash Patil [Mon, 14 Apr 2014 22:32:53 +0000 (15:32 -0700)]
mwifiex: update timestamp information for aggregation packets
New skbs are allocated at the time of AMSDU aggregation. Setting
up in timestamps for such skbs was missing which would result
into wrong queue delays passed to FW. Fix this by setting
timestamp of skbs created for AMSDU aggregation.
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:32:52 +0000 (15:32 -0700)]
mwifiex: increase the number of nodes in command pool
Command nodes are increased from 20 to 50. Now we can always
scan 1 channel per scan command to avoid traffic delay/loss in
connected state. We will get rid of *CHANNEL_PER_SCAN_CMD macros
used due to command node constraints.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:32:51 +0000 (15:32 -0700)]
mwifiex: improve AMSDU packet aggregation for PCIe and SDIO
For PCIe, aggregate more AMSDU packets till PCIe TXBD is full.
For SDIO, aggregation was disabled for AMSDU packets because
AMSDU aggregated packet size is already 4K or 8K, SDIO Multiport
Aggregation feature didn't use to gain much previously.
Now with increased multiport aggregation buffer, we can enable
it for AMSDU packets.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:32:50 +0000 (15:32 -0700)]
mwifiex: increase SDIO multiport aggregation buffer sizes
Currently Tx and Rx buffer sizes are 8K and 16K respectively for
all chipsets. We will change them to 32K for SD8897 and 16K for
older chipsets. SD8897 chipset has more SDIO data ports than
older chipsets.
This patch will help to improve throughput numbers.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Maithili Hinge [Mon, 14 Apr 2014 22:32:49 +0000 (15:32 -0700)]
mwifiex: change memset to simple assignment for ht_cap.mcs.rx_mask
WARNING: single byte memset is suspicious.
Swapped 2nd/3rd argument?
This code happens to work because rx_mcs is the first variable
in mcs structure. We should use 'mcs.rx_mcs' here anyway.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Maithili Hinge <maithili@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Tue, 8 Apr 2014 08:36:19 +0000 (11:36 +0300)]
wil6210: Use larger Tx rings
When using scatter-gather, more descriptor entries get used.
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Tue, 8 Apr 2014 08:36:18 +0000 (11:36 +0300)]
wil6210: relaxed check for BACK start sequence
Sometimes, due to the race between Rx path and WMI_BA_STATUS_EVENTID WMI event,
few frames may be passed to the stack before reorder buffer allocated.
Then, after BACK establishment, it start getting frames with sequence number ahead of
SSN, and it get interpreted as missing frames. Then, BACK mechanism will wait
for missing frames; data traffic will be stopped. In case of interface configured
for DHCP, this data delay causes DHCP failure.
Relax checking for sequence number; use sequence of 1-st frame handled by the buffer
as SSN for this buffer.
This is work-around, real fix should be done when proper BACK mechanism implemented.
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Tue, 8 Apr 2014 08:36:17 +0000 (11:36 +0300)]
wil6210: sync with the latest FW API
- add pcp_max_assoc_sta to the struct wmi_pcp_start_cmd
- enum for the scan ststus
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Vladimir Kondratiev [Tue, 8 Apr 2014 08:36:16 +0000 (11:36 +0300)]
wil6210: fix printouts for better readability
Reshuffle prints to consolidate firmware/hardware information
report upon card init
Convert print for unhandled MISC ISR bits to "debug" - it is
normal situation and not an "error"
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Felix Fietkau [Sat, 5 Apr 2014 22:37:03 +0000 (00:37 +0200)]
ath9k: implement p2p client powersave support
Use generic TSF timers to trigger powersave state changes based
information from the P2P NoA attribute.
Opportunistic Powersave is not handled, because the driver does not
support powersave at the moment.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Felix Fietkau [Sat, 5 Apr 2014 22:37:02 +0000 (00:37 +0200)]
ath9k: support only one P2P interface
Preparation for adding P2P powersave and multi-channel support.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Julia Lawall [Tue, 1 Apr 2014 13:49:18 +0000 (15:49 +0200)]
ray_cs: replace del_timer by del_timer_sync
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exits.
This change was suggested by Thomas Gleixner.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
identifier i,t,ex;
@@
struct t i = { .remove = ex, };
@@
identifier r.ex;
@@
ex(...) {
<...
- del_timer
+ del_timer_sync
(...)
...>
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andrea Merello [Fri, 4 Apr 2014 16:25:51 +0000 (18:25 +0200)]
rtl8180: be paranoid in stopping unused queues.
HW should never attempt to perform DMA for unused queues.
For rtl8187se this is ensured by setting a dedicated register at
init time, before enabling TX.
In rtl8180/5 the register is only written at the first TX (because
in rtl8180/5 it serves also to kick DMA for used queues).
This should be enough, but it's worth to add a register write at
init time, before enabling TX.
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andrea Merello [Fri, 4 Apr 2014 16:24:36 +0000 (18:24 +0200)]
rtl8180: add parentheses to REG_ADDR macros
Parentheses are missing around the macro argument, causing the
macro possibly not to work passing certain expressions as
arguments.
This should not cause any issues with current code, however it's
worth to add them, as a good practice, and to eventually avoid
future bugs.
Suggested-by: Dave Kilroy <kilroyd@googlemail.com>
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andrea Merello [Fri, 4 Apr 2014 16:21:14 +0000 (18:21 +0200)]
rtl8180: fix enabled interrupt mask for rtl8187se
When preparing the bitfield to write to HW register, the high-priority
queue error interrupt bit is set two times, and the beacon queue
TX-OK interrupt is not enabled.
Currently this have no functional impact because the high-priority
queue is not used at all, and the beacon queue is not used yet.
This patch removes high-priority queue bits and it adds the
beacon queue missing bit.
It removes also the management queue bits because it is not used.
This was found by static code analyzer.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Larry Finger [ original patch ] [Tue, 8 Apr 2014 18:25:25 +0000 (20:25 +0200)]
rtl8180: change module name in rtl818x_pci
rtl8180 driver can handle also rtl8185 and rtl8187SE cards,
however in userspace tools (network manager) it still appares
as "rtl8180".
This might lead the user to think the wrong driver is in use.
This patch changes module name to "rtl818x_pci" that should be
more explanatory.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> [ original patch ]
Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
John W. Linville [Tue, 22 Apr 2014 19:02:03 +0000 (15:02 -0400)]
Merge branch 'for-linville' of git://github.com/kvalo/ath
John W. Linville [Tue, 22 Apr 2014 19:01:24 +0000 (15:01 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/iwlwifi/iwlwifi-next
John W. Linville [Thu, 17 Apr 2014 14:34:22 +0000 (10:34 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Kees Cook [Wed, 16 Apr 2014 17:54:34 +0000 (10:54 -0700)]
seccomp: fix memory leak on filter attach
This sets the correct error code when final filter memory is unavailable,
and frees the raw filter no matter what.
unreferenced object 0xffff8800d6ea4000 (size 512):
comm "sshd", pid 278, jiffies
4294898315 (age 46.653s)
hex dump (first 32 bytes):
21 00 00 00 04 00 00 00 15 00 01 00 3e 00 00 c0 !...........>...
06 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
backtrace:
[<
ffffffff8151414e>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff811a3a40>] __kmalloc+0x280/0x320
[<
ffffffff8110842e>] prctl_set_seccomp+0x11e/0x3b0
[<
ffffffff8107bb6b>] SyS_prctl+0x3bb/0x4a0
[<
ffffffff8152ef2d>] system_call_fastpath+0x1a/0x1f
[<
ffffffffffffffff>] 0xffffffffffffffff
Reported-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Masami Ichikawa <masami256@gmail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 16 Apr 2014 11:25:16 +0000 (14:25 +0300)]
isdn: icn: buffer overflow in icn_command()
This buffer over was detected using static analysis:
drivers/isdn/icn/icn.c:1325 icn_command()
error: format string overflow. buf_size: 60 length: 98
The calculation for the length of the string is off because it assumes
that the dial[] buffer holds a 50 character string, but actually it is
at most 31 characters and NUL. I have removed the dial[] buffer because
it isn't needed.
The maximum length of the string is actually 79 characters and a NUL. I
have made the cbuf[] array large enough to hold it and changed the
sprintf() to an snprintf() as a further safety enhancement.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 16 Apr 2014 09:19:34 +0000 (11:19 +0200)]
ip6_tunnel: use the right netns in ioctl handler
Because the netdevice may be in another netns than the i/o netns, we should
use the i/o netns instead of dev_net(dev).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 16 Apr 2014 09:19:33 +0000 (11:19 +0200)]
sit: use the right netns in ioctl handler
Because the netdevice may be in another netns than the i/o netns, we should
use the i/o netns instead of dev_net(dev).
Note that netdev_priv(dev) cannot bu NULL, hence we can remove these useless
checks.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 16 Apr 2014 09:19:32 +0000 (11:19 +0200)]
ip_tunnel: use the right netns in ioctl handler
Because the netdevice may be in another netns than the i/o netns, we should
use the i/o netns instead of dev_net(dev).
The variable 'tunnel' was used only to get 'itn', hence to simplify code I
remove it and use 't' instead.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Glauber [Wed, 16 Apr 2014 07:32:48 +0000 (09:32 +0200)]
net: use SYSCALL_DEFINEx for sys_recv
Make sys_recv a first class citizen by using the SYSCALL_DEFINEx
macro. Besides being cleaner this will also generate meta data
for the system call so tracing tools like ftrace or LTTng can
resolve this system call.
Signed-off-by: Jan Glauber <jan.glauber@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Apr 2014 19:10:00 +0000 (15:10 -0400)]
Merge branch 'mdio-gpio'
Guenter Roeck says:
====================
net: mdio-gpio enhancements
The following series of patches adds support for active-low gpio pins
as well as for systems with separate MDI and MDO pins to the mdio-gpio
driver.
A board using those features is based on a COM Express CPU board.
The COM Express standard supports GPIO pins on its connector,
with one caveat: The pins on the connector have fixed direction
and are hard configured either as input or output pins.
The COM Express Design Guide [1] provides additional details.
The hardware uses three of the GPO/GPI pins from the COM Express board
to drive an MDIO bus. Connectivity between GPI/GPO pins and the MDIO bus
is as follows.
GPI2 --------------------+------------ MDIO
|
+--------+ |
GPO2 ---+---G | |
| | | |
4.7k | 2N7002 D---+
| | |
+---S |
| +--------+
GND
GPO1 --------------------------------- MDC
To support this hardware, two extensions to the driver were necessary.
- Due to the FET in the MDO path (GPO2), the MDO signal is inverted.
The driver therefore has to support active-low GPIO pins.
- The MDIO signal must be separated into MDI and MDO.
Those changes are implemented in patch 2/3 and 3/3.
Patch 1/3 simplifies the error path and thus the subsequent
patches.
[1] http://www.picmg.org/pdf/picmg_comdg_100.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 16 Apr 2014 02:16:42 +0000 (19:16 -0700)]
net: mdio-gpio: Add support for separate MDI and MDO gpio pins
This is for a system with fixed assignments of input and output pins
(various variants of Kontron COMe).
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 16 Apr 2014 02:16:41 +0000 (19:16 -0700)]
net: mdio-gpio: Add support for active low gpio pins
Some systems using mdio-gpio may use active-low gpio pins
(eg with inverters or FETs connected to all or some of the
gpio pins).
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 16 Apr 2014 02:16:40 +0000 (19:16 -0700)]
net: mdio-gpio: Use devm_ functions where possible
This simplifies error path and deinit/removal functions.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Apr 2014 19:05:39 +0000 (15:05 -0400)]
Merge branch 'fib_validate_loopback'
Cong Wang says:
====================
ipv4: fix flowi4_iif for input routing
This patchset fixes ->flowi4_iif for input routing and rp filter,
based on suggestion from Julian. See per patch for details.
v1 -> v2:
* merge the first two patches into one
* fix fib_check_nh() too
* add this cover letter
====================
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 15 Apr 2014 23:25:35 +0000 (16:25 -0700)]
ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
In my special case, when a packet is redirected from veth0 to lo,
its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we
pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source()
in ip_route_input_slow(). This would cause the following check
in fib_validate_source() fail:
(dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))
when rp_filter is disabeld on loopback. As suggested by Julian,
the caller should pass 0 here so that we will not end up by
calling __fib_validate_source().
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 15 Apr 2014 23:25:34 +0000 (16:25 -0700)]
ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
As suggested by Julian:
Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.
because in fib_rule_match() we do:
if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
goto out;
flowi4_iif should be LOOPBACK_IFINDEX by default.
We need to move LOOPBACK_IFINDEX to include/net/flow.h:
1) It is mostly used by flowi_iif
2) Fix the following compile error if we use it in flow.h
by the patches latter:
In file included from include/linux/netfilter.h:277:0,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:21,
from include/linux/netdevice.h:43,
from include/linux/icmpv6.h:12,
from include/linux/ipv6.h:61,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:27,
from include/linux/nfs_fs.h:30,
from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mason [Tue, 15 Apr 2014 22:09:24 +0000 (18:09 -0400)]
mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
The mlx4 driver is triggering schedules while atomic inside
mlx4_en_netpoll:
spin_lock_irqsave(&cq->lock, flags);
napi_synchronize(&cq->napi);
^^^^^ msleep here
mlx4_en_process_rx_cq(dev, cq, 0);
spin_unlock_irqrestore(&cq->lock, flags);
This was part of a patch by Alexander Guller from Mellanox in 2011,
but it still isn't upstream.
Signed-off-by: Chris Mason <clm@fb.com>
cc: stable@vger.kernel.org
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Apr 2014 18:37:13 +0000 (14:37 -0400)]
Merge branch 'mvneta_qsgmii'
Thomas Petazzoni says:
====================
net: mvneta: fix usage as a module, and support QSGMII properly
This set of patches is a new attempt at fixing the operation of the
mvneta driver when built as a module. For the record, the previous
attempt, merged in commit
e3a8786c10e75903f1269474e21fe8cb49c3a670
('net: mvneta: fix usage as a module on RGMII configurations') caused
problems for all RGMII configurations.
In fact, it turned out that the MAC to PHY connection on the Armada XP
GP, which was described as using RGMII-ID according to its Device
Tree, is in fact a QSGMII connection. And the RGMII and QSGMII
configurations have to be handled in a different way in the driver,
because the SERDES configuration is different in those two cases.
So, this patch series fixes that by:
* Adding minimal handling of a "qsgmii" connection type in the PHY
layer. Mainly to make sure that a "qsgmii" phy-mode in the Device
Tree is recognized, and handed over to the driver as
PHY_INTERFACE_QSGMII.
* Changing the mvneta driver to properly configure the RGMIIEn and
PCSEn bits in the GMAC_CTRL_2 register, and configure the SERDES
register, in the three possible cases: RGMII, SGMII and QSGMII.
* Updating the Device Tree of the Armada XP GP board to reflect the
fact that it uses a QSGMII MAC/PHY connection.
PATCH 1 and 2 would be merged by David Miller, through the net tree,
while PATCH 3 would be merged by the mach-mvebu maintainers, through
their tree and arm-soc.
This set of patches has been tested on:
* Armada XP GP (four QSGMII interfaces)
* Armada XP DB (two RGMII interfaces and two SGMII interfaces)
* Armada 370 Mirabox (two RGMII interfaces)
I've tested both the driver built-in, and compiled as a module.
Since the last attempt at fixing this was quite a fiasco, I'd like
this new attempt to be tested more widely before being applied. I'll
try to do some testing on other Armada boards I have, but independent
testing from other persons would also be appreciated.
Note that these patches apply after reverting the previous attempt,
obviously.
====================
Tested-by: Arnaud Ebalard <arno@natisbad.org>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 15 Apr 2014 13:50:20 +0000 (15:50 +0200)]
net: mvneta: properly configure the MAC <-> PHY connection in all situations
Commit
5445eaf309ff ('mvneta: Try to fix mvneta when compiled as
module') fixed the mvneta driver to make it work properly when loaded
as a module in SGMII configuration, which was tested successful by the
author on the Armada XP OpenBlocks AX3, which uses SGMII.
However, some other platforms, namely the Armada XP GP don't use
SGMII, but a QSGMII connection between the MAC and the PHY, and this
case was not supported by the mvneta driver, which was relying on
configuration put in place by the bootloader. While this works when
the mvneta driver is built-in (because clocks are not gated), it
breaks when mvneta is built as a module, because the clock is gated
(all configuration is lost) and then re-enabled when the mvneta driver
is loaded.
In order to support all of RGMII, SGMII and QSGMII, this commit
reworks how the PHY interface configuration is done, and simplifies
it: it removes the mvneta_port_sgmii_config() and
mvneta_gmac_rgmii_set() functions, which were strange because
mvneta_gmac_rgmii_set() was called in all cases, even for SGMII
configurations. Also, the mvneta_gmac_rgmii_set() function was taking
a boolean as argument, which was always true.
Instead, all the PHY interface configuration logic is moved into the
mvneta_port_power_up() function, in a much simpler 'switch' construct,
with four cases:
- QSGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and
the SERDES is configured in QSGMII. Technically speaking,
configuring the SERDES of the first port would be sufficient, but
it is simpler to do it on all ports.
- SGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and
the SERDES is configured as SGMII.
- RGMII: the RGMIIEn bit in GMAC_CTRL_2 is set. The PCSEn bit is kept
cleared, and no SERDES configuration is done, because RGMII is not
using SERDES lanes.
- other: an error is returned. For this reason, the
mvneta_port_power_up() now returns an int instead of nothing, and
the return value is checked by mvneta_probe().
This has been successfully tested on:
* Armada XP DB, which has two RGMII and two SGMII connections
* Armada XP GP, which uses QSGMII for its four interfaces
* Armada 370 Mirabox, which has two RGMII connections
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Tue, 15 Apr 2014 13:50:19 +0000 (15:50 +0200)]
net: phy: add minimal support for QSGMII PHY
This commit adds the necessary definitions for the PHY layer to
recognize "qsgmii" as a valid PHY interface. A QSMII interface, as
defined at
http://en.wikipedia.org/wiki/Media_Independent_Interface#Quad_Serial_Gigabit_Media_Independent_Interface,
is "is a method of combining four SGMII lines into a 5Gbit/s
interface. QSGMII, like SGMII, uses LVDS signalling for the TX and RX
data and a single LVDS clock signal. QSGMII uses significantly fewer
signal lines than four SGMII busses."
This type of MAC <-> PHY connection might require special handling on
the MAC driver side, so it should be possible to express this type of
MAC <-> PHY connection, for example in the Device Tree.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: devicetree@vger.kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Wed, 16 Apr 2014 18:27:48 +0000 (19:27 +0100)]
sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
When an MCDI command times out (whether or not we find it
completed when we poll), call efx_mcdi_abandon(), which tells
all subsequent MCDI calls to fail-fast, and queues up an FLR.
Because an FLR doesn't lead to receiving any reboot even from
the MC (unlike most other types of reset), we have to call
efx_ef10_reset_mc_allocations.
In efx_start_all(), if a reset (of any kind) is pending, we
bail out.
Without this, attempts to reconfigure (e.g. change mtu) can
cause driver/mc state inconsistency if the first MCDI call
triggers an FLR.
For similar reasons, on EF10, in
efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number
of active queues to zero before calling efx_stop_all().
And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT),
set active_queues and flushes pending & outstanding to zero.
efx_mcdi_mode_{poll,event}() should not take us out of fail-fast
mode. Instead, this is done by efx_mcdi_reset() after the FLR
completes.
The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really
fit into the hierarchy of reset 'scopes' whereby efx_reset()
decides some resets subsume others. Thus, it uses separate logic.
Also, fixed up some inconsistency around RESET_TYPE_MC_BIST,
which was in the wrong place in that hierarchy.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 16 Apr 2014 03:30:30 +0000 (20:30 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix BPF filter validation of netlink attribute accesses, from
Mathias Kruase.
2) Netfilter conntrack generation seqcount not initialized properly,
from Andrey Vagin.
3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
from Patrick McHardy.
4) Properly limit MTU over ipv6, from Eric Dumazet.
5) Fix seccomp system call argument population on 32-bit, from Daniel
Borkmann.
6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
skb->mac_len needs to be used. From Vlad Yasevich.
7) We have several cases of using socket based communications to
implement a tunnel. For example, some tunnels are encapsulations
over UDP so we use an internal kernel UDP socket to do the
transmits.
These tunnels should behave just like other software devices and
pass the packets on down to the next layer.
Most importantly we want the top-level socket (eg TCP) that created
the traffic to be charged for the SKB memory.
However, once you get into the IP output path, we have code that
assumed that whatever was attached to skb->sk is an IP socket.
To keep the top-level socket being charged for the SKB memory,
whilst satisfying the needs of the IP output path, we now pass in an
explicit 'sk' argument.
From Eric Dumazet.
8) ping_init_sock() leaks group info, from Xiaoming Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
cxgb4: use the correct max size for firmware flash
qlcnic: Fix MSI-X initialization code
ip6_gre: don't allow to remove the fb_tunnel_dev
ipv4: add a sock pointer to dst->output() path.
ipv4: add a sock pointer to ip_queue_xmit()
driver/net: cosa driver uses udelay incorrectly
at86rf230: fix __at86rf230_read_subreg function
at86rf230: remove check if AVDD settled
net: cadence: Add architecture dependencies
net: Start with correct mac_len in skb_network_protocol
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
qlcnic: Fix PVID configuration on eSwitch port.
qlcnic: Fix max ring count calculation
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
...
Steve Wise [Tue, 15 Apr 2014 19:22:34 +0000 (14:22 -0500)]
cxgb4: use the correct max size for firmware flash
The wrong max fw size was being used and causing false
"too big" errors running ethtool -f.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Gordeev [Tue, 15 Apr 2014 09:37:14 +0000 (11:37 +0200)]
qlcnic: Fix MSI-X initialization code
Function qlcnic_setup_tss_rss_intr() might enter endless
loop in case pci_enable_msix() contiguously returns a
positive number of MSI-Xs that could have been allocated.
Besides, the function contains 'err = -EIO;' assignment
that never could be reached. This update fixes the
aforementioned issues.
Cc: Shahed Shaikh <shahed.shaikh@qlogic.com>
Cc: Dept-HSGLinuxNICDev@qlogic.com
Cc: netdev@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Mon, 14 Apr 2014 15:11:38 +0000 (17:11 +0200)]
ip6_gre: don't allow to remove the fb_tunnel_dev
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.
Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).
Introduced by commit
c12b395a4664 ("gre: Support GRE over IPv6").
CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Apr 2014 17:47:15 +0000 (13:47 -0400)]
ipv4: add a sock pointer to dst->output() path.
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.
The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.
Fixes:
8f646c922d55 ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amitkumar Karwar [Mon, 14 Apr 2014 22:31:06 +0000 (15:31 -0700)]
mwifiex: fix hung task on command timeout
Sometimes when command timeout occurs due to a firmware or
hardware bug, there may be some synchronous commands in command
queue. These commands are never downloaded to firmware causing
hung task warnings. This patch replaces wait_event_interruptible
call with wait_event_interruptible_timeout to fix the issue.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Amitkumar Karwar [Mon, 14 Apr 2014 22:31:05 +0000 (15:31 -0700)]
mwifiex: process event before command response
During extended scan, SCAN report event is always followed by
command response. Sometimes It is observed that command response
is processed before SCAN report which leads to a crash, because
current command node is cleared while handling the response.
This patch makes sure that driver's main thread gives priority
to events over command responses.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Maithili Hinge <maithili@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Eric Dumazet [Tue, 15 Apr 2014 16:58:34 +0000 (12:58 -0400)]
ipv4: add a sock pointer to ip_queue_xmit()
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit
31c70d5956fc ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.
One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.
Fixes:
31c70d5956fc ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li, Zhen-Hua [Tue, 15 Apr 2014 01:53:11 +0000 (09:53 +0800)]
driver/net: cosa driver uses udelay incorrectly
In cosa driver, udelay with more than 20000 may cause __bad_udelay.
Use msleep for instead.
Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 14 Apr 2014 16:48:02 +0000 (18:48 +0200)]
at86rf230: fix __at86rf230_read_subreg function
The __at86rf230_read_subreg function don't mask and shift register
contents which it should do. This patch adds the necessary masks and
shift operations in this function.
Since we have csma support this can make some trouble on state changes.
Since CSMA support turned on some bits in the TRX_STATUS register that
used to be zero, not masking broke checking of the TRX_STATUS field
after commanding a state change.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 14 Apr 2014 16:48:01 +0000 (18:48 +0200)]
at86rf230: remove check if AVDD settled
The AVDD regulator is only enabled when the RF section is active TX_ON
(PLL_ON) state. Since commit
7dcbd22a97eb0689e6c583ad630ae0e7341e34c1
("ieee802154: ensure that first RF212 state comes from TRX_OFF").
We are in TRX_OFF state at the time at86rf230_hw_init is run.
Note that this test would only fail in case of a severe hardware
malfunction (faulty/shorted power supply, etc.) so it wasn't all that
useful in the first place.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Mon, 14 Apr 2014 13:38:49 +0000 (15:38 +0200)]
net: cadence: Add architecture dependencies
The Cadence ethernet chipsets are only used on specific ARM
architectures. Add Kconfig dependencies so that drivers for these
chipsets are only buildable on the relevant architectures.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 14 Apr 2014 23:21:28 +0000 (16:21 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
- Fix for guest triggerable BUG_ON (CVE-2014-0155)
- CR4.SMAP support
- Spurious WARN_ON() fix
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove WARN_ON from get_kernel_ns()
KVM: Rename variable smep to cr4_smep
KVM: expose SMAP feature to guest
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
KVM: Add SMAP support when setting CR4
KVM: Remove SMAP bit from CR4_RESERVED_BITS
KVM: ioapic: try to recover if pending_eoi goes out of range
KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
Linus Torvalds [Mon, 14 Apr 2014 23:04:14 +0000 (16:04 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull bmc2835 crypto fix from Herbert Xu:
"This fixes a potential boot crash on bcm2835 due to the recent change
that now causes hardware RNGs to be accessed on registration"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
Mikulas Patocka [Mon, 14 Apr 2014 20:58:55 +0000 (16:58 -0400)]
user namespace: fix incorrect memory barriers
smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.
In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
instructions in the body of the for loop
The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Mon, 14 Apr 2014 23:00:10 +0000 (19:00 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains three Netfilter fixes for your net tree,
they are:
* Fix missing generation sequence initialization which results in a splat
if lockdep is enabled, it was introduced in the recent works to improve
nf_conntrack scalability, from Andrey Vagin.
* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
disabled otherwise this crashes due to a double release, from Andrey
Vagin.
* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Mon, 14 Apr 2014 21:37:26 +0000 (17:37 -0400)]
net: Start with correct mac_len in skb_network_protocol
Sometimes, when the packet arrives at skb_mac_gso_segment()
its skb->mac_len already accounts for some of the mac lenght
headers in the packet. This seems to happen when forwarding
through and OpenSSL tunnel.
When we start looking for any vlan headers in skb_network_protocol()
we seem to ignore any of the already known mac headers and start
with an ETH_HLEN. This results in an incorrect offset, dropped
TSO frames and general slowness of the connection.
We can start counting from the known skb->mac_len
and return at least that much if all mac level headers
are known and accounted for.
Fixes:
53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment)
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
Tested-by: Martin Filip <nexus+kernel@smoula.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Tosatti [Thu, 10 Apr 2014 21:19:12 +0000 (18:19 -0300)]
KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.
https://bugzilla.kernel.org/show_bug.cgi?id=73721
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Feng Wu [Tue, 1 Apr 2014 09:56:48 +0000 (17:56 +0800)]
KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Feng Wu [Tue, 1 Apr 2014 09:46:36 +0000 (17:46 +0800)]
KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Feng Wu [Tue, 1 Apr 2014 09:46:35 +0000 (17:46 +0800)]
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Feng Wu [Tue, 1 Apr 2014 09:46:34 +0000 (17:46 +0800)]
KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests
Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Feng Wu [Tue, 1 Apr 2014 09:46:33 +0000 (17:46 +0800)]
KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Daniel Borkmann [Mon, 14 Apr 2014 19:45:17 +0000 (21:45 +0200)]
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
This reverts commit
ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management
to reflect real state of the receiver's buffer") as it introduced a
serious performance regression on SCTP over IPv4 and IPv6, though a not
as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
Current state:
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
Cookie: Lab200slot2.
1397238981.812898.548918
[ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
[ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
[ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
[ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
[ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
[ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
[ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
[ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
[ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
[ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
[ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
[ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
[ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
(etc)
[root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
Cookie: Lab200slot2.
1397243321.714295.2b3f7c
[ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec
[ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec
[ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec
[ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec
[ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec
[ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec
[ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec
[ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec
[ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec
[ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec
[ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec
[ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec
[ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec
(etc)
After patch:
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
Cookie: Lab200slot2.
1397493648.413274.65e131
[ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
[ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
[ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
[ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
[ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
[ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
[ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
[ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec
With the reverted patch applied, the SCTP/IPv4 performance is back
to normal on latest upstream for IPv4 and IPv6 and has same throughput
as 3.4.2 test kernel, steady and interval reports are smooth again.
Fixes:
ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
Reported-by: Peter Butler <pbutler@sonusnet.com>
Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Peter Butler <pbutler@sonusnet.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Wise [Mon, 14 Apr 2014 19:22:43 +0000 (14:22 -0500)]
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
Hardware needs the local device mac address to support hw loopback for
rdma loopback connections.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 14 Apr 2014 19:20:12 +0000 (21:20 +0200)]
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
been wrongly decoded by commit
a8fc927780 ("sk-filter: Add ability to
get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.
In practice, this should not have much side-effect though, as such
conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
operation PR_GET_SECCOMP will only return the current seccomp mode, but
not the filter itself. Since the transition to the new BPF infrastructure,
it's also not used anymore, so we can simply remove this as it's
unreachable.
Fixes:
a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 14 Apr 2014 19:02:59 +0000 (21:02 +0200)]
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:
audit: type=1326 audit(
1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
audit: type=1326 audit(
1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000
These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.
The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:
struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};
Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:
syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.
Fixes:
bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliad Peller [Sun, 13 Apr 2014 13:33:51 +0000 (16:33 +0300)]
wl18xx: align event mailbox with current fw
Some fields are missing from the event mailbox
struct definitions, which cause issues when
trying to handle some events.
Add the missing fields in order to align the
struct size (without adding actual support
for the new fields).
Reported-and-tested-by: Imre Kaloz <kaloz@openwrt.org>
Cc: stable@vger.kernel.org # 3.14+
Fixes:
028e724 ("wl18xx: move to new firmware (wl18xx-fw-3.bin)")
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Christian Engelmayer [Thu, 10 Apr 2014 18:37:53 +0000 (20:37 +0200)]
rsi: Fix a potential memory leak in rsi_send_auto_rate_request()
Fix a potential memory leak in the error path of function
rsi_send_auto_rate_request(). In case memory allocation for array
'selected_rates' fails, the error path exits and leaves the previously
allocated skb in place. Detected by Coverity: CID
1195575.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Frederic Danis [Thu, 10 Apr 2014 09:36:38 +0000 (11:36 +0200)]
cw1200: Fix cw1200_debug_link_id
This array is used in debug string to display cw1200_link_status
defined in drivers/net/wireless/cw1200/cw1200.h.
Add missing strings for CW1200_LINK_RESET and CW1200_LINK_RESET_REMAP.
Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Luciano Coelho [Thu, 10 Apr 2014 07:01:37 +0000 (10:01 +0300)]
wlcore: ignore dummy packet events in PLT mode
Sometimes the firmware sends a dummy packet event while we are in PLT
mode. This doesn't make sense, it's a firmware bug. Fix this by
ignoring dummy packet events when we're PLT mode.
Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Reported-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Luciano Coelho <luca@coelho.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Christian Engelmayer [Wed, 9 Apr 2014 19:28:54 +0000 (21:28 +0200)]
rsi: Fix a potential memory leak in rsi_set_channel()
Fix a potential memory leak in function rsi_set_channel() that is used to
program channel changes. The channel check block for the frequency bands
directly exits the function in case of an error, thus leaving an already
allocated skb unreferenced. Move the checks above allocating the skb.
Detected by Coverity: CID
1195576.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Geert Uytterhoeven [Sun, 6 Apr 2014 13:30:39 +0000 (15:30 +0200)]
rsi: Add missing initialization of ii
drivers/net/wireless/rsi/rsi_91x_core.c: In function ‘rsi_core_determine_hal_queue’:
drivers/net/wireless/rsi/rsi_91x_core.c:91: warning: ‘ii’ may be used uninitialized in this function
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
John W. Linville [Mon, 14 Apr 2014 18:21:07 +0000 (14:21 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/iwlwifi/iwlwifi-fixes
John W. Linville [Mon, 14 Apr 2014 17:47:01 +0000 (13:47 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211
David S. Miller [Mon, 14 Apr 2014 17:43:58 +0000 (13:43 -0400)]
Merge branch 'qlcnic'
Shahed Shaikh says:
====================
qlcnic: Bug fixes
This patch series contains following bug fixes -
* Send INIT_NIC_FUNC mailbox command as first mailbox
* Fix a panic because of uninitialized delayed_work.
* Fix inconsistent calculation of max rings count.
* Fix PVID configuration issue. Driver needs to clear older
PVID before adding new one.
* Fix QLogic application/driver interface by packing vNIC information
array.
* Fix a crash when user tries to disable SR-IOV while VFs are
still assigned to VMs.
Please apply to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Mon, 14 Apr 2014 14:02:23 +0000 (10:02 -0400)]
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
o While disabling SR-IOV when VFs are assigned to VMs causes host crash
so return -EPERM when user request to disable SR-IOV using pci sysfs in
case of VFs are assigned to VMs.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Mon, 14 Apr 2014 14:02:22 +0000 (10:02 -0400)]
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
o Application expect vNIC number as the array index but driver interface
return configuration in array index form.
o Pack the vNIC information array in the buffer such that application can
access it using vNIC number as the array index.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Mon, 14 Apr 2014 14:02:21 +0000 (10:02 -0400)]
qlcnic: Fix PVID configuration on eSwitch port.
Clear older PVID before adding a newer PVID to the eSwicth port
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Mon, 14 Apr 2014 14:02:20 +0000 (10:02 -0400)]
qlcnic: Fix max ring count calculation
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
provided and driver defined values.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Mon, 14 Apr 2014 14:02:19 +0000 (10:02 -0400)]
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and
parameter query commands after that command.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Mon, 14 Apr 2014 14:02:18 +0000 (10:02 -0400)]
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
o AEN event was being received before initializing delayed_work struct
and handlers for it. This was resulting in crash. This patch fixes it.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Apr 2014 17:41:42 +0000 (13:41 -0400)]
Merge branch 'be2net'
Sathya Perla says:
====================
be2net: patch set
Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a
different patch-set). v2 incorporates a suggestion given by David Laight
for how long to poll for pending TX completions while disabling a device.
Patch 2/2 fixes a crash in be_remove()->be_close()
path after be2net has aborted an EEH error recovery
due to a permanant failure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Mon, 14 Apr 2014 10:42:41 +0000 (16:12 +0530)]
be2net: Fix invocation of be_close() after be_clear()
In the EEH error recovery path, when a permanent failure occurs,
we clean up adapter structure (i.e. destroy queues etc) by calling
be_clear() and return PCI_ERS_RESULT_DISCONNECT.
After this the stack tries to remove device from bus and calls
be_remove() which invokes netdev_unregister()->be_close().
be_close() operating on destroyed queues results in a
NULL dereference.
This patch fixes this problem by introducing a flag to keep track
of the setup state.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 14 Apr 2014 10:42:40 +0000 (16:12 +0530)]
be2net: Fix to reap TX compls till HW doesn't respond for some time
be_close() currently waits for a max of 200ms to receive all pending
TX compls. This timeout value was roughly calculated based on 10G
transmission speeds and the TX queue depth. This timeout may not be
enough when the link is operating at lower speeds or in multi-channel/SR-IOV
configs with TX-rate limiting setting.
It is hard to calculate a "proper timeout value" that works in all
configurations. This patch solves this problem by continuing to reap
TX completions till the HW is completely silent for a period of 10ms or
a HW error is detected.
v2: implements the new scheme (as suggested by David Laight) instead of
just waiting longer than 200ms for reaping all completions.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 14 Apr 2014 08:17:22 +0000 (11:17 +0300)]
net/mlx4_core: Defer VF initialization till PF is fully initialized
Fix in commit [1] is not sufficient since a deferred VF initialization
could happen after pci_enable_sriov() is finished, but before the PF is
fully initialized.
Need to prevent VFs from initializing till the PF is fully ready and
comm channel is operational.
[1] -
9798935 "net/mlx4_core: mlx4_init_slave() shouldn't access comm
channel before PF is ready"
CC: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel J Blueman [Fri, 11 Apr 2014 08:14:26 +0000 (16:14 +0800)]
bnx2: Don't build unused suspend/resume functions not enabled
When CONFIG_PM_SLEEP isn't enabled, bnx2_suspend/resume are unused; don't
build them when they aren't used.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 11 Apr 2014 04:23:36 +0000 (21:23 -0700)]
ipv6: Limit mtu to 65575 bytes
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 12 Apr 2014 11:17:57 +0000 (13:17 +0200)]
netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.
This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.
The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Andrey Vagin [Fri, 11 Apr 2014 17:34:20 +0000 (21:34 +0400)]
netfilter: nf_conntrack: initialize net.ct.generation
[ 251.920788] INFO: trying to register non-static key.
[ 251.921386] the code is fine but needs lockdep annotation.
[ 251.921386] turning off the locking correctness validator.
[ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294
[ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 251.921386]
0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd
[ 251.921386]
ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0
[ 251.921386]
ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172
[ 251.921386] Call Trace:
[ 251.921386] [<
ffffffff816b7ecd>] dump_stack+0x45/0x56
[ 251.921386] [<
ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c
[ 251.921386] [<
ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40
[ 251.921386] [<
ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 251.921386] [<
ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
[ 251.921386] [<
ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40
[ 251.921386] [<
ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
[ 251.921386] [<
ffffffff810c7272>] lock_acquire+0xa2/0x120
[ 251.921386] [<
ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[ 251.921386] [<
ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack]
[ 251.921386] [<
ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[ 251.921386] [<
ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
[ 251.921386] [<
ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[ 251.921386] [<
ffffffff815d8c5a>] nf_iterate+0xaa/0xc0
[ 251.921386] [<
ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[ 251.921386] [<
ffffffff815d8d14>] nf_hook_slow+0xa4/0x190
[ 251.921386] [<
ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
[ 251.921386] [<
ffffffff815e98f2>] ip_output+0x92/0x100
[ 251.921386] [<
ffffffff815e8df9>] ip_local_out+0x29/0x90
[ 251.921386] [<
ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0
[ 251.921386] [<
ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0
[ 251.921386] [<
ffffffff81601208>] tcp_transmit_skb+0x498/0x960
[ 251.921386] [<
ffffffff81602d82>] tcp_connect+0x812/0x960
[ 251.921386] [<
ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70
[ 251.921386] [<
ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0
[ 251.921386] [<
ffffffff81606f57>] tcp_v4_connect+0x317/0x470
[ 251.921386] [<
ffffffff8161f645>] __inet_stream_connect+0xb5/0x330
[ 251.921386] [<
ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0
[ 251.921386] [<
ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
[ 251.921386] [<
ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0
[ 251.921386] [<
ffffffff8161f8f8>] inet_stream_connect+0x38/0x50
[ 251.921386] [<
ffffffff8158b157>] SYSC_connect+0xe7/0x120
[ 251.921386] [<
ffffffff810e3789>] ? current_kernel_time+0x69/0xd0
[ 251.921386] [<
ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 251.921386] [<
ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
[ 251.921386] [<
ffffffff8158c36e>] SyS_connect+0xe/0x10
[ 251.921386] [<
ffffffff816caf69>] system_call_fastpath+0x16/0x1b
[ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333)
[ 312.015097] INFO: Stall ended before state dump start
Fixes:
93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mathias Krause [Sun, 13 Apr 2014 16:23:33 +0000 (18:23 +0200)]
filter: prevent nla extensions to peek beyond the end of the message
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.
The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.
The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.
,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
| ld #0x87654321
| ldx #42
| ld #nla
| ret a
`---
,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
| ld #0x87654321
| ldx #42
| ld #nlan
| ret a
`---
,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
| ; (needs a fake netlink header at offset 0)
| ld #0
| ldx #42
| ld #nlan
| ret a
`---
Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.
Fixes:
4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
Fixes:
d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>