Dan Carpenter [Sat, 20 Apr 2013 11:24:55 +0000 (14:24 +0300)]
ipvs: off by one in set_sctp_state()
The sctp_events[] come from sch->type in set_sctp_state(). They are
between 0-255 so that means we need 256 elements in the array.
I believe that because of how the code is aligned there is normally a
hole after sctp_events[] so this patch doesn't actually change anything.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Simon Horman [Fri, 19 Apr 2013 01:33:59 +0000 (10:33 +0900)]
ipvs: Use min3() in ip_vs_dbg_callid()
There are two motivations for this:
1. It improves readability to my eyes
2. Using nested min() calls results in a shadowed _min1 variable,
which is a bit untidy. Sparse complained about this.
I have also replaced (size_t)64 with a variable of type size_t and value 64.
This also improves readability to my eyes.
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Simon Horman [Fri, 19 Apr 2013 01:25:42 +0000 (10:25 +0900)]
ipvs: Avoid shadowing net variable in ip_vs_leave()
Flagged by sparse.
Compile and sparse tested only.
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Wed, 17 Apr 2013 20:50:49 +0000 (23:50 +0300)]
ipvs: fix sparse warnings for some parameters
Some service fields are in network order:
- netmask: used once in network order and also as prefix len for IPv6
- port
Other parameters are in host order:
- struct ip_vs_flags: flags and mask moved between user and kernel only
- sync state: moved between user and kernel only
- syncid: sent over network as single octet
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Wed, 17 Apr 2013 20:50:47 +0000 (23:50 +0300)]
ipvs: fix sparse warnings in lblc and lblcr
kbuild test robot reports for sparse warnings in
commits
c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu")
and
c5549571f975ab ("ipvs: convert lblcr scheduler to rcu").
Fix it by removing extra __rcu annotation.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Wed, 17 Apr 2013 20:50:50 +0000 (23:50 +0300)]
ipvs: fix the remaining sparse warnings in ip_vs_ctl.c
- RCU annotations for ip_vs_info_seq_start and _stop
- __percpu for cpustats
- properly dereference svc->pe in ip_vs_genl_fill_service
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Wed, 17 Apr 2013 20:50:46 +0000 (23:50 +0300)]
ipvs: fix sparse warnings for ip_vs_conn listing
kbuild test robot reports for sparse warnings
in commit
088339a57d6042 ("ipvs: convert connection locking"):
net/netfilter/ipvs/ip_vs_conn.c:962:13: warning: context imbalance
in 'ip_vs_conn_array' - wrong count at exit
include/linux/rcupdate.h:326:30: warning: context imbalance in
'ip_vs_conn_seq_next' - unexpected unlock
include/linux/rcupdate.h:326:30: warning: context imbalance in
'ip_vs_conn_seq_stop' - unexpected unlock
Fix it by running ip_vs_conn_array under RCU lock
to avoid conditional locking and by adding proper RCU
annotations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Wed, 17 Apr 2013 20:50:45 +0000 (23:50 +0300)]
ipvs: properly dereference dest_dst in ip_vs_forget_dev
Use rcu_dereference_protected to resolve
sparse warning, found by kbuild test robot:
net/netfilter/ipvs/ip_vs_ctl.c:1464:35: warning: dereference of
noderef expression
Problem from commit
026ace060dfe29
("ipvs: optimize dst usage for real server")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Daniel Borkmann [Thu, 18 Apr 2013 21:59:37 +0000 (21:59 +0000)]
net: sctp: minor: remove dead code from sctp_packet
struct sctp_packet is currently embedded into sctp_transport or
sits on the stack as 'singleton' in sctp_outq_flush(). Therefore,
its member 'malloced' is always 0, thus a kfree() is never called.
Because of that, we can just remove this code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Stevens [Fri, 19 Apr 2013 00:36:26 +0000 (00:36 +0000)]
VXLAN: Allow L2 redirection with L3 switching
Allow L2 redirection when VXLAN L3 switching is enabled
This patch restricts L3 switching to destination MAC addresses that are
marked as routers in order to allow virtual IP appliances that do L2
redirection to function with VXLAN L3 switching enabled.
We use L3 switching on VXLAN networks to avoid extra hops when the nominal
router for cross-subnet traffic for a VM is remote and the ultimate
destination may be local, or closer to the local node. Currently, the
destination IP address takes precedence over the MAC address in all cases.
Some network appliances receive packets for a virtualized IP address and
redirect by changing the destination MAC address (only) to be the final
destination for packet processing. VXLAN tunnel endpoints with L3 switching
enabled may then overwrite this destination MAC address based on the packet IP
address, resulting in potential loops and, at least, breaking L2 redirections
that travel through tunnel endpoints.
This patch limits L3 switching to the intended case where the original
destination MAC address is a next-hop router and relies on the destination
MAC address for all other cases, thus allowing L2 redirection and L3 switching
to coexist peacefully.
Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Wed, 17 Apr 2013 22:17:50 +0000 (22:17 +0000)]
net: Remove return value from list_netdevice()
The return value from list_netdevice() is not used and no need, so remove it.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 19 Apr 2013 21:29:25 +0000 (14:29 -0700)]
net: remove a stale comment for dl_next
dl_next member in struct request_sock doesn't need to be first.
We expect to insert a "struct common_sock" or a subset of it,
so this claim had to be verified.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Mon, 22 Apr 2013 01:12:29 +0000 (01:12 +0000)]
qeth: Fix missing pointer update
qeth_hdr_chk_and_bounce() can possibly shift the skb->data
pointer. However, the existing code didn't update the hdr pointer,
which should point to skb->data, accordingly.
Symptoms of this issue are sporadic recoveries.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Mon, 22 Apr 2013 01:12:28 +0000 (01:12 +0000)]
qeth: remove unused variable
remove unused variable
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Yanfei [Mon, 22 Apr 2013 01:12:27 +0000 (01:12 +0000)]
qeth: remove cast for kzalloc return value
remove cast for kzalloc return value.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Mon, 22 Apr 2013 02:20:43 +0000 (02:20 +0000)]
xen-netback: don't disconnect frontend when seeing oversize packet
Some frontend drivers are sending packets > 64 KiB in length. This length
overflows the length field in the first slot making the following slots have
an invalid length.
Turn this error back into a non-fatal error by dropping the packet. To avoid
having the following slots having fatal errors, consume all slots in the
packet.
This does not reopen the security hole in XSA-39 as if the packet as an
invalid number of slots it will still hit fatal error case.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Mon, 22 Apr 2013 02:20:42 +0000 (02:20 +0000)]
xen-netback: coalesce slots in TX path and fix regressions
This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
With the help of coalescing, this patch tries to address two regressions
avoid reopening the security hole in XSA-39.
Regression 1. The reduction of the number of supported ring entries (slots)
per packet (from 18 to 17). This regression has been around for some time but
remains unnoticed until XSA-39 security fix. This is fixed by coalescing
slots.
Regression 2. The XSA-39 security fix turning "too many frags" errors from
just dropping the packet to a fatal error and disabling the VIF. This is fixed
by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17)
which rules out false positive (using 18 slots is legit) and dropping packets
using 19 to `max_skb_slots` slots.
To avoid reopening security hole in XSA-39, frontend sending packet using more
than max_skb_slots is considered malicious.
The behavior of netback for packet is thus:
1-18 slots: valid
19-max_skb_slots slots: drop and respond with an error
max_skb_slots+ slots: fatal error
max_skb_slots is configurable by admin, default value is 20.
Also change variable name from "frags" to "slots" in netbk_count_requests.
Please note that RX path still has dependency on MAX_SKB_FRAGS. This will be
fixed with separate patch.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Mon, 22 Apr 2013 02:20:41 +0000 (02:20 +0000)]
xen-netfront: reduce gso_max_size to account for max TCP header
The maximum packet including header that can be handled by netfront / netback
wire format is 65535. Reduce gso_max_size accordingly.
Drop skb and print warning when skb->len > 65535. This can 1) save the effort
to send malformed packet to netback, 2) help spotting misconfiguration of
netfront in the future.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Mon, 22 Apr 2013 02:20:40 +0000 (02:20 +0000)]
xen-netfront: frags -> slots in log message
Also fix a typo in comment.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Hada [Sun, 21 Apr 2013 23:28:18 +0000 (23:28 +0000)]
be2net: enable IOMMU pass through for be2net
This patch sets the coherent DMA mask to 64-bit after the be2net driver has
been acknowledged that the system is 64-bit DMA capable. The coherent DMA
mask is examined by the Intel IOMMU driver to determine whether to allow
pass through context mapping for all devices. With this patch, the be2net
driver combined with be2net compatible hardware provides comparable
performance to the case where vt-d is disabled. The main use-case for this
change is to decrease the time necessary to copy virtual machine memory
during KVM live migration instantiations.
This patch was tested on a system that enables the IOMMU in non-coherent
mode. Two DMA remapper issues were encountered in the previous version and
both patches have been committed.
commit
ea2447f700cab264019b52e2b417d689e052dcfd
commit
2e12bc29fc5a12242d68e11875db3dd58efad9ff
Signed-off-by: Craig Hada <craig.hada@hp.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Sun, 21 Apr 2013 23:28:17 +0000 (23:28 +0000)]
be2net: Use GET_PROFILE_CONFIG V1 cmd for BE3-R
Use GET_PROFILE_CONFIG_V1 cmd for BE3-R, to query the maximum number of
TX rings available per function. On SH-R the same is queried via the
GET_FUNCTION_CONFIG cmd.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Sun, 21 Apr 2013 23:28:16 +0000 (23:28 +0000)]
be2net: Avoid flashing BE3 UFI on BE3-R chip.
Avoid flashing BE3 UFI on BE3-R chip by verifying asic_revision
number of the chip.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Sun, 21 Apr 2013 23:28:15 +0000 (23:28 +0000)]
be2net: Don't log "Out of MCCQ wrbs" error
Don't log "Out of MCCQ wrbs" msg. When the driver doesn't receive any
response from the FW it already logs a "FW not responding" message.
The driver runs out of MCCQ wrbs much later. Also, this message can
swamp the kernel log in HW/FW error scenarios.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Sun, 21 Apr 2013 23:28:14 +0000 (23:28 +0000)]
be2net: Use TXQ_CREATE_V2 cmd
Skyhawk-R and BE3-R (SuperNIC profile) require V2 version
of TXQ_CREATE cmd to be used.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:11 +0000 (03:48 +0000)]
bnx2x: update version to 1.78.17-0
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:10 +0000 (03:48 +0000)]
bnx2x: allow nvram test to run when device is down
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:09 +0000 (03:48 +0000)]
bnx2x: add additional regions for CRC memory test
a. Common tree of `dir` structures.
b. Multi-port devices structures.
CC: Francious Romieu <romieu@fz.zoreil.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:08 +0000 (03:48 +0000)]
bnx2x: remove non-necessary assignment
CC: Francious Romieu <romieu@fz.zoreil.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:07 +0000 (03:48 +0000)]
bnx2x: fix byte-by-byte nvram write for BE machines
CC: Francious Romieu <romieu@fz.zoreil.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 22 Apr 2013 03:48:06 +0000 (03:48 +0000)]
bnx2x: refactor nvram read procedure
introduce a procedure to read in u32 granularity.
CC: Francious Romieu <romieu@fz.zoreil.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 21 Apr 2013 00:09:34 +0000 (00:09 +0000)]
qeth: fix VLAN related compilation errors
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc':
>> drivers/s390/net/qeth_l3_main.c:1662:3: error: too few arguments to function '__vlan_find_dev_deep'
include/linux/if_vlan.h:88:27: note: declared here
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc6':
>> drivers/s390/net/qeth_l3_main.c:1723:3: error: too few arguments to function '__vlan_find_dev_deep'
include/linux/if_vlan.h:88:27: note: declared here
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses4':
>> drivers/s390/net/qeth_l3_main.c:1767:2: error: too few arguments to function '__vlan_find_dev_deep'
include/linux/if_vlan.h:88:27: note: declared here
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses6':
>> drivers/s390/net/qeth_l3_main.c:1797:2: error: too few arguments to function '__vlan_find_dev_deep'
include/linux/if_vlan.h:88:27: note: declared here
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_process_inbound_buffer':
>> drivers/s390/net/qeth_l3_main.c:1980:6: error: too few arguments to function '__vlan_hwaccel_put_tag'
include/linux/if_vlan.h:234:31: note: declared here
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_verify_vlan_dev':
>> drivers/s390/net/qeth_l3_main.c:2089:3: error: too few arguments to function '__vlan_find_dev_deep'
include/linux/if_vlan.h:88:27: note: declared here
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 21 Apr 2013 00:09:32 +0000 (00:09 +0000)]
net: vlan: fix up vlan_proto_idx() for CONFIG_BUG=n
Add missing return statement for CONFIG_BUG=n.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 20 Apr 2013 23:51:41 +0000 (23:51 +0000)]
net: vlan: fix dummy function signatures for CONFIG_VLAN=n
Fix up some function signatures for CONFIG_VLAN=n that were missed during
the 802.1ad support patches.
Found by the kbuild robot.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 20 Apr 2013 23:34:40 +0000 (23:34 +0000)]
net: vlan: fix memory leak in vlan_info_rcu_free()
The following leak is reported by kmemleak:
[ 86.812073] kmemleak: Found object by alias at 0xffff88006ecc76f0
[ 86.816019] Pid: 739, comm: kworker/u:1 Not tainted 3.9.0-rc5+ #842
[ 86.816019] Call Trace:
[ 86.816019] <IRQ> [<
ffffffff81151c58>] find_and_get_object+0x8c/0xdf
[ 86.816019] [<
ffffffff8190e90d>] ? vlan_info_rcu_free+0x33/0x49
[ 86.816019] [<
ffffffff81151cbe>] delete_object_full+0x13/0x2f
[ 86.816019] [<
ffffffff8194bbb6>] kmemleak_free+0x26/0x45
[ 86.816019] [<
ffffffff8113e8c7>] slab_free_hook+0x1e/0x7b
[ 86.816019] [<
ffffffff81141c05>] kfree+0xce/0x14b
[ 86.816019] [<
ffffffff8190e90d>] vlan_info_rcu_free+0x33/0x49
[ 86.816019] [<
ffffffff810d0b0b>] rcu_do_batch+0x261/0x4e7
The reason is that in vlan_info_rcu_free() we don't take the VLAN protocol
into account when iterating over the vlan_devices_array.
Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 21:55:29 +0000 (17:55 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
The following patchset contains a small batch of Netfilter
updates for your net-next tree, they are:
* Three patches that provide more accurate error reporting to
user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing
code and NAT, from Patrick McHardy.
* Update copyright statements in Netfilter filters of
Patrick McHardy, from himself.
* Add Kconfig dependency on the raw/mangle tables to the
rpfilter, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Tue, 16 Apr 2013 14:46:00 +0000 (14:46 +0000)]
bond: add support to read speed and duplex via ethtool
This patch adds support for the get_settings ethtool op to the bonding
driver. This was motivated by users who wanted to get the speed of the
bond and compare that against throughput to understand utilization.
The behavior before this patch was added was problematic when computing
line utilization after trying to get link-speed and throughput via SNMP.
Output from ethtool looks like this for a round-robin bond:
Settings for bond0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 11000Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Link detected: yes
I tested this and verified it works as expected. A test was also done
on a version backported to an older kernel and it worked well there.
v2: Switch to using ethtool_cmd_speed_set to set speed, added check to
SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations
as they were not needed, and set port type to 'Other.'
v3: Fix useless assignment and checkpatch warning.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 16 Apr 2013 01:29:11 +0000 (01:29 +0000)]
packet: move hw/sw timestamp extraction into a small helper
This patch introduces a small, internal helper function, that is used by
PF_PACKET. Based on the flags that are passed, it extracts the packet
timestamp in the receive path. This is merely a refactoring to remove
some duplicate code in tpacket_rcv(), to make it more readable, and to
enable others to use this function in PF_PACKET as well, e.g. for TX.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 16 Apr 2013 01:29:10 +0000 (01:29 +0000)]
net: socket: move ktime2ts to ktime header api
Currently, ktime2ts is a small helper function that is only used in
net/socket.c. Move this helper into the ktime API as a small inline
function, so that i) it's maintained together with ktime routines,
and ii) also other files can make use of it. The function is named
ktime_to_timespec_cond() and placed into the generic part of ktime,
since we internally make use of ktime_to_timespec(). ktime_to_timespec()
itself does not check the ktime variable for zero, hence, we name
this function ktime_to_timespec_cond() for only a conditional
conversion, and adapt its users to it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 20:36:12 +0000 (16:36 -0400)]
net: Add .gitignore to networking selftests directory.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 20:08:23 +0000 (16:08 -0400)]
net: Add missing netdev feature strings for NETIF_F_HW_VLAN_STAG_*
Noticed by Ben Hutchings.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 20:16:42 +0000 (16:16 -0400)]
Merge branch 'qlcnic'
Rajesh Borundia says:
====================
* "qlcnic: Change 82xx adapter VLAN id endian type".
- Adapter requires VLAN id in little endian. VLAN id was being
converted to __le16 and then passed as a parameter. Pass VLAN id
as u16 and then use cpu_to_le16 at appropriate places. It is
appropriate for net-next as SR-IOV patches have a dependency on it.
* "qlcnic: Fix loopback test for SR-IOV PF".
- It is appropriate for net-next as change is needed for SRIOV PF
only.
* Remaining patches add enhancements to SR-IOV functionality like
- FLR handling
- Adapter reset recovery handling
- iproute2 tool support for configuring MAC address, Tx rate and
VLAN id.
- Mailbox polling support for SR-IOV PF in case mailbox interrupts
are disabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:15 +0000 (07:01 +0000)]
qlcnic: Update version to 5.2.41
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:14 +0000 (07:01 +0000)]
qlcnic: Support polling for mailbox events.
o When mailbox interrupt is disabled PF should be
able to process request from VF. Enable polling
for such cases.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:13 +0000 (07:01 +0000)]
qlcnic: Fix loopback test for SR-IOV PF.
o Do not disable mailbox interrupts while running
loopback test through SR-IOV PF.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:12 +0000 (07:01 +0000)]
qlcnic: Support VLAN id config.
o Add support for VLAN id configuration per VF using
iproute2 tool.
o VLAN id's 1-4094 are treated as PVID by the PF and
Guest VLAN tagging is not allowed by default.
o PVID is disabled when the VLAN id is set to 0
o Guest VLAN tagging is allowed when the VLAN id is set to 4095.
o Only one Guest VLAN id is supported.
o VLAN id can be changed only when the VF driver is not loaded.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:11 +0000 (07:01 +0000)]
qlcnic: Support MAC address, Tx rate config.
o Add support for MAC address and Tx rate configuration
per VF via iproute2 tool.
o Tx rate change is allowed while the guest is running
and the VF driver is loaded.
o MAC address change is allowed only when VF driver
is not loaded.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:10 +0000 (07:01 +0000)]
qlcnic: VF reset recovery implementation.
o Implement recovery mechanism for VF to recover from
adapter resets.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:09 +0000 (07:01 +0000)]
qlcnic: VF FLR implementation.
o FLR from Hypervisor - When hypervisor issues a VF FLR request,
adapter notifies the parent PF driver of the FLR request for PF
driver to perform any cleanup on behalf of that VF.
o FLR from VF Driver - VF driver may initiate a VF FLR request,
if VF state needs to be cleaned up before a re-initialization.
VF re-initialization during kdump is an example.
o PF driver cleans up all resources allocated on behalf of a VF,
on VF FLR notifications from the adapter or from the VF driver.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Fri, 19 Apr 2013 07:01:08 +0000 (07:01 +0000)]
qlcnic: Change 82xx adapter VLAN id endian type.
o 82xx adapter requires VLAN id in little endian format.
Instead of passing vlan id parameter as __le16, pass the
parameter as u16 and use cpu_to_le16 at appropriate places.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 19:37:09 +0000 (15:37 -0400)]
Merge branch 'netlink-mmap'
Patrick McHardy says:
====================
The following patches contain an implementation of memory mapped I/O for
netlink. The implementation is modelled after AF_PACKET memory mapped I/O
with a few differences:
- In order to perform memory mapped I/O to userspace, the kernel allocates
skbs with the data area pointing to the data area of the mapped frames.
All netlink subsystems assume a linear data area, so for the sake of
simplicity, the mapped data area is not attached to the paged area but
to skb->data. This requires introduction of a special skb alloction
function that just allocates an skb head without the data area. Since this
is a quite rare use case, I introduced a new function based on __alloc_skb
instead of splitting it up into head and data alloction. The alternative
would be to introduce an __alloc_skb_head and __alloc_skb_data function,
which would actually be useful for a specific error case in memory mapped
netlink, but would require a couple of extra instructions for the common
skb allocation case, so it doesn't really seem worth it.
In order to get the destination memory area for skb->data before message
construction, memory mapped netlink I/O needs to look up the destination
socket during allocation instead of during transmission because the
ring is owned by the receiveing socket/process. A special skb allocation
function (netlink_alloc_skb) taking the destination pid as an argument is
used for this, all subsystems that want to support memory mapped I/O need
to use this function, automatic fallback to the receive queue happens
for unconverted subsystems. Dumps automatically use memory mapped I/O if
the receiving socket has enabled it.
The visible effect of looking up the destination socket during allocation
instead of transmission is that message ordering in userspace might
change in case allocation and transmission aren't performed atomically.
This usually doesn't matter since most subsystems have a BKL-like lock
like the rtnl mutex, to my knowledge the currently only existing case
where it might matter is nfnetlink_queue combined with the recently
introduced batched verdicts, but a) that subsystem already includes
sequence numbers which allow userspace to reorder messages in case it
cares to, also the reodering window is quite small and b) with memory
mapped transmission batching can be performed in a subsystem indepandant
manner.
- AF_NETLINK contains flow control for database dumps, with regular I/O
dump continuation are triggered based on the sockets receive queue space
and by recvmsg() calls. Since with memory mapped I/O there are no
recvmsg() calls under normal operation, this is done in netlink_poll(),
under the assumption that userspace has processed all pending frames
before invoking poll(), thus the ring is expected to have room for new
messages. Dumps currently don't benefit as much as they could from
memory mapped I/O because each single continuation requires a poll()
call. A more agressive approach seems like a good idea to me, especially
in case the socket is not subscribed to any multicast groups (IOW only
receiving explicitly requested data).
Besides that, the memory mapped netlink implementation extends the states
defined by AF_PACKET between userspace and the kernel by a SKIP status, this
is intended for the case that userspace wants to queue frames (specifically
when using nfnetlink_queue, an IDS and stream reassembly, requested by
Eric Leblond) for a longer period of time. The kernel skips over all frames
marked with SKIP when looking or unused frames and only fails when not finding
a free frame or when having skipped the entire ring.
Also noteworthy is memory mapped sendmsg: the kernel performs validation
of messages before accepting and processing them, in order to prevent
userspace from changing the messages contents after validation, the
kernel checks that the ring is only mapped once and the file descriptor
is not shared (in order to avoid having userspace set up another mapping
after the first mentioned check). If either of both is not true, the
message copied to an allocated skb and processed as with regular I/O.
I'd especially appreciate review of this part since I'm not really versed
in memory, file and process management,
The remaining interesting details are included in the changelogs of the
individual patches and the documentation, so I won't repeat them here.
As an example, nfnetlink_queue is convererted to support memory mapped
I/O. Other subsystems that would probably benefit are nfnetlink_log,
audit and maybe ISCSI, not sure.
Following are some numbers collected by Florian Westphal based on a
slightly older version, which included an experimental patch for the
nfnetlink_queue ordering issue.
===
Test hardware is a 12-core machine
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
ixgbe interfaces are used (i.e., multiqueue nics).
irqs are distributed across the cpus.
I've made several tests.
The simple one consists of 3GBit UDP traffic, packets are 1500 bytes
in size (i.e., no fragmentation), with a single nfqueue
and the test client programs in libmnl examples directory.
Packets are sent from one /24 net to another /24 net, i.e.
there are a few hundred flows active at any given time.
I've also tested with snort, but I disabled all rules.
6Gbit UDP traffic is generated in the snort case, and
6 nfqueues are used (i.e., 6 snorts run in parallel).
I've tested with 3 different kernels, all based on 3.7.1.
- 3.7.1, without the mmap patches
- 3.7.1, with Patricks mmap patches
- 3.7.1, with mmap patches and extended spinlock to ensure packet ids are
monotonically increasing and cannot be re-ordered. This is what we
currently ship in our product.
[ the spinlock that is extended is the per nfqueue spinlock, it will
be held from the time the netlink skb is allocated until the netlink
skb is sent to userspace:
http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=
b8eb19c46650fef4e9e4fe53f367f99bbf72afc9
]
snort is normally used in "batch mode", i.e., after processing 25 packets
a single "batch verdict" is sent to accept the packets seen so far.
"mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this
time (except where noted below).
One reason is that snort has a reload thread, so kernel needs to copy;
also in the snort case no payload rewrite takes place, so compared
to the rx path the tx path is cheap.
Results:
3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone
nfq-queue: 1.7 gbit out
snort-recv-batch-25 5.1 gbit out
snort-recv-no-batch 3.1 gbit out
3.7.1 + mmap + without extended spinlocked section
nfq-queue: 1.7 gbit out (recv/sendmsg)
nfq-queue-mmap: 2.4 gbit out
snort-mmap-batch-25 5.6 gbit out (warning: since ids can be
re-ordered, this version is "broken").
snort-recv-batch-25 5.1 gbit out
snort-mmap-no-batch 4.6 gbit out (i.e., one verdict per packet)
Kernel 3.7.1 + mmap + extended spinlock section:
nfq-queue: 1.4 gbit out
nfq-queue-mmap: 2.3 gbit out
snort: 5.6 gbit out
Conclusions:
- The "extended spinlocked section" hurts performance in the
single queue case; with 6 snorts there is no measureable slowdown.
- I tried to re-write the mmap-snort to work without batch verdicts, but
results were not very encouraging:
kernel 3.7.1 + mmap (without extended spinlocked section):
snort-mmap-batch-25 5.6 gbit out (what we currenlty ship)
snort-recv-batch-25 5.1 gbit out (without using mmap)
snort-mmap-batch-1 4.6 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-25 5.2 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-1 4.6 gbit out (with mmap but without batch verdicts)
The difference between the last two is that in the txring-25 case, we
put a verdict into the tx ring after every packet, but will only
invoke sendmsg(, NULL, 0) after processing 25 packets. So the only
difference is the number of sendmsg calls/context switches.
So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster
than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:09 +0000 (06:47 +0000)]
nfnetlink: add support for memory mapped netlink
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:08 +0000 (06:47 +0000)]
netfilter: rename netlink related "pid" variables to "portid"
Get rid of the confusing mix of pid and portid and use portid consistently
for all netlink related socket identities.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:07 +0000 (06:47 +0000)]
netlink: add documentation for memory mapped I/O
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:06 +0000 (06:47 +0000)]
netlink: add RX/TX-ring support to netlink diag
Based on AF_PACKET.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:05 +0000 (06:47 +0000)]
netlink: add flow control for memory mapped I/O
Add flow control for memory mapped RX. Since user-space usually doesn't
invoke recvmsg() when using memory mapped I/O, flow control is performed
in netlink_poll(). Dumps are allowed to continue if at least half of the
ring frames are unused.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:04 +0000 (06:47 +0000)]
netlink: implement memory mapped recvmsg()
Add support for mmap'ed recvmsg(). To allow the kernel to construct messages
into the mapped area, a dataless skb is allocated and the data pointer is
set to point into the ring frame. This means frames will be delivered to
userspace in order of allocation instead of order of transmission. This
usually doesn't matter since the order is either not determinable by
userspace or message creation/transmission is serialized. The only case
where this can have a visible difference is nfnetlink_queue. Userspace
can't assume mmap'ed messages have ordered IDs anymore and needs to check
this if using batched verdicts.
For non-mapped sockets, nothing changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:03 +0000 (06:47 +0000)]
netlink: implement memory mapped sendmsg()
Add support for mmap'ed sendmsg() to netlink. Since the kernel validates
received messages before processing them, the code makes sure userspace
can't modify the message contents after invoking sendmsg(). To do that
only a single mapping of the TX ring is allowed to exist and the socket
must not be shared. If either of these two conditions does not hold, it
falls back to copying.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:02 +0000 (06:47 +0000)]
netlink: add mmap'ed netlink helper functions
Add helper functions for looking up mmap'ed frame headers, reading and
writing their status, allocating skbs with mmap'ed data areas and a poll
function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:01 +0000 (06:47 +0000)]
netlink: mmaped netlink: ring setup
Add support for mmap'ed RX and TX ring setup and teardown based on the
af_packet.c code. The following patches will use this to add the real
mmap'ed receive and transmit functionality.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:47:00 +0000 (06:47 +0000)]
netlink: add netlink_skb_set_owner_r()
For mmap'ed I/O a netlink specific skb destructor needs to be invoked
after the final kfree_skb() to clean up state. This doesn't work currently
since the skb's ownership is transfered to the receiving socket using
skb_set_owner_r(), which orphans the skb, thereby invoking the destructor
prematurely.
Since netlink doesn't account skbs to the originating socket, there's no
need to orphan the skb. Add a netlink specific skb_set_owner_r() variant
that does not orphan the skb and use a netlink specific destructor to
call sock_rfree().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:46:59 +0000 (06:46 +0000)]
netlink: don't orphan skb in netlink_trim()
Netlink doesn't account skbs to the sending socket, so the there's no
need to orphan the skb before trimming it.
Removing the skb_orphan() call is required for mmap'ed netlink, which uses
a netlink specific skb destructor that must not be invoked before the
final freeing of the skb.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:46:58 +0000 (06:46 +0000)]
net: add function to allocate sk_buff head without data area
Add a function to allocate a sk_buff head without any data. This will
be used by memory mapped netlink to attach data from the mmaped area
to the skb.
Additionally change skb_release_all() to check whether the skb has a
data area to allow the skb destructor to clear the data pointer in case
only a head has been allocated.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:46:57 +0000 (06:46 +0000)]
netlink: rename ssk to sk in struct netlink_skb_params
Memory mapped netlink needs to store the receiving userspace socket
when sending from the kernel to userspace. Rename 'ssk' to 'sk' to
avoid confusion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Apr 2013 06:46:56 +0000 (06:46 +0000)]
netlink: add symbolic value for congested state
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 18:46:27 +0000 (14:46 -0400)]
Merge branch '8021ad'
Patrick McHardy says:
====================
The following patches add support for 802.1ad (provider tagging) to the
VLAN driver. The patchset consists of the following parts:
- renaming of the NET_F_HW_VLAN feature flags to indicate that they only
operate on CTAGs
- preparation for 802.1ad VLAN filtering offload by adding a proto argument
to the rx_{add,kill}_vid net_device_ops callbacks
- preparation of the VLAN code to support multiple protocols by making the
protocol used for tagging a property of the VLAN device and converting
the device lookup functions accordingly
- second step of preparation of the VLAN code by making the packet tagging
functions take a protocol argument
- introducation of 802.1ad support in the VLAN code, consisting mainly of
checking for ETH_P_8021AD in a couple of places and testing the netdevice
offload feature checks to take the protocol into account
- announcement of STAG offloading capabilities in a couple of drivers for
virtual network devices
The patchset is based on net-next.git and has been tested with single and
double tagging with and without HW acceleration (for CTAGs).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:32 +0000 (02:04 +0000)]
net: vlan: announce STAG offload capability in some drivers
- macvlan: propagate STAG filtering capabilities from underlying device
- ifb: announce STAG tagging support in addition to CTAG tagging support
- veth: announce STAG tagging/stripping support in addition to CTAG support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:31 +0000 (02:04 +0000)]
net: vlan: add 802.1ad support
Add support for 802.1ad VLAN devices. This mainly consists of checking for
ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check
offloading capabilities based on the used protocol.
Configuration is done using "ip link":
# ip link add link eth0 eth0.1000 \
type vlan proto 802.1ad id 1000
# ip link add link eth0.1000 eth0.1000.1000 \
type vlan proto 802.1q id 1000
52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64
92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84)
20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:30 +0000 (02:04 +0000)]
net: vlan: add protocol argument to packet tagging functions
Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:29 +0000 (02:04 +0000)]
net: vlan: prepare for 802.1ad support
Make the encapsulation protocol value a property of VLAN devices and change
the device lookup functions to take the protocol value into account.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:28 +0000 (02:04 +0000)]
net: vlan: prepare for 802.1ad VLAN filtering offload
Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Fri, 19 Apr 2013 02:04:27 +0000 (02:04 +0000)]
net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*
Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 18:19:07 +0000 (14:19 -0400)]
Merge branch 'intel'
Jeff Kirsher says:
====================
This series contains updates to ixgbe and igb.
The ixgbe changes contains 2 patches from the community, one which is a
fix from akepner to fix a issue where netif_running() in shutdown was
not done under rtnl_lock. The other community fix from Joe Perches
cleans up #ifdef CONFIG_DEBUG_FS which is no longer necessary. The
last ixgbe patch, from Jacob Keller, adds support for WoL on 82559
SFP+ LOM.
The remaining patches are against igb, 10 of which were previously
submitted in a pull request where changes were requested.
The following igb patches:
igb: Support for 100base-fx SFP
igb: Support to read and export SFF-8472/8079 data
are v2 based on feedback from Dan Carpenter and Ben Hutchings in
the previous pull request.
The largest set of changes are in my patch to cleanup code comments
and whitespace to align the igb driver with the networking style of
code comments. While cleaning up the code comments, fixed several
other whitespace/checkpatch.pl code formatting issues.
Other notable igb patches are EEE capable devices query the PHY to
determine what the link partner is advertising, added support for
i354 devices and added support for spoofchk config.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Thu, 18 Apr 2013 22:21:30 +0000 (22:21 +0000)]
igb: Add support for i354 devices
This patch adds base support for new i354 devices. Loopback test is
unsupported for this release.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lior Levy [Sun, 3 Mar 2013 20:27:48 +0000 (20:27 +0000)]
igb: add support for spoofchk config
Add support for spoofchk configuration per VF via iproute2 tool.
Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Matthew Vick [Thu, 21 Feb 2013 03:32:52 +0000 (03:32 +0000)]
igb: Enable EEE LP advertisement
On EEE-capable devices, query the PHY to determine what the link partner is
advertising.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Sat, 23 Feb 2013 07:29:56 +0000 (07:29 +0000)]
igb: Fix code comments and whitespace
Aligns the multi-line code comments with the desired style for the
networking tree. Also cleaned up whitespace issues found during the
cleanup of code comments (i.e. remove unnecessary blank lines,
use tabs where possible, properly wrap lines and keep strings on a
single line)
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Akeem G. Abodunrin [Sat, 16 Feb 2013 07:09:06 +0000 (07:09 +0000)]
igb: Fix sparse warnings on function pointers
This patch fixes sparse warnings on function pointers that are not
defined as static.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 12 Feb 2013 02:31:01 +0000 (02:31 +0000)]
igb: Use rx/tx_itr_setting when setting up initial value of itr
It turns out that the InterruptThrottleRate module parameter was only
having the effect of locking the ITR at the starting ITR value. This was
because the values stored in rx_itr_setting and tx_itr_setting were being
ignored when configuring the initial itr_val of the q_vector.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 7 Feb 2013 08:55:46 +0000 (08:55 +0000)]
igb: Pull adapter out of main path in igb_xmit_frame_ring
We only need the adapter pointer in the case of ptp. As such we can pull the
adapter out of the main path and place it inside the if statement to avoid
the temptation of accessing the adapter pointer in the fast path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 1 Feb 2013 08:56:47 +0000 (08:56 +0000)]
igb: Mask off check of frag_off as we only want fragment offset
We were incorrectly checking the entire frag_off field when we only wanted the
fragment offset. As a result we were not pulling in TCP headers when the DNF
flag was set.
To correct that we will now check for frag off using the IP_OFFSET mask.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Akeem G. Abodunrin [Tue, 29 Jan 2013 10:15:31 +0000 (10:15 +0000)]
igb: random code and comments fix
This patch fixes code and comments as identified in the driver.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Akeem G. Abodunrin [Tue, 29 Jan 2013 10:15:26 +0000 (10:15 +0000)]
igb: Implement support to power sfp cage and turn on I2C
Based on original patch from Aurélien Guillaume <footplus@gmail.com>
This patch adds support to turn on I2C, with sfp cage powered.
CC: Aurélien Guillaume <footplus@gmail.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Akeem G. Abodunrin [Thu, 11 Apr 2013 06:36:35 +0000 (06:36 +0000)]
igb: Support to read and export SFF-8472/8079 data
This patch adds support to read and export SFF-8472/8079 (SFP data)
over i2c, through Ethtool.
v2: Changed implementation to accommodate any offset within SFF module
length boundary.
Reported-by: Aurélien Guillaume <footplus@gmail.com>
CC: Aurélien Guillaume <footplus@gmail.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Florian Westphal [Wed, 17 Apr 2013 22:45:25 +0000 (22:45 +0000)]
netfilter: xt_rpfilter: depend on raw or mangle table
rpfilter is only valid in raw/mangle PREROUTING, i.e.
RPFILTER=y|m is useless without raw or mangle table support.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Akeem G. Abodunrin [Fri, 5 Apr 2013 16:49:06 +0000 (16:49 +0000)]
igb: Support for 100base-fx SFP
This patch adds support for 100base-fx SFP and report proper link speed/duplex
via Ethtool.
v2: fix smatch warnings
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Joe Perches [Fri, 12 Apr 2013 17:12:54 +0000 (17:12 +0000)]
ixgbe: Remove unnecessary #ifdef CONFIG_DEBUG_FS tests
Add some empty static inlines instead to make
the code more readable.
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 3 Apr 2013 04:41:37 +0000 (04:41 +0000)]
ixgbe: Add support for WoL on 82599 SFP+ LOM
This patch adds software support for WoL for the 82599 SFP+ LOM device,
(ID 0x8976)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
akepner [Wed, 13 Mar 2013 14:54:58 +0000 (14:54 +0000)]
ixgbe: in shutdown, do netif_running() under rtnl_lock
During shutdown it's possible for __dev_close() (which holds
rtnl_lock) to clear the __LINK_STATE_START bit, and for ixgbe
to then read that bit (without holding rtnl_lock), and then
not fail to free irqs, etc. The result is a crash like this:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:313!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Pid: 5910, comm: reboot Tainted: P ---------------- 2.6.32 #1 empty
RIP: 0010:[<
ffffffff81305c2b>] [<
ffffffff81305c2b>] free_msi_irqs+0x11b/0x130
RSP: 0018:
ffff880185c9bc88 EFLAGS:
00010282
RAX:
ffff880219f58bc0 RBX:
ffff88021ac53b00 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
0000000000000246 RDI:
000000000000004a
RBP:
ffff880185c9bcc8 R08:
0000000000000002 R09:
0000000000000106
R10:
0000000000000000 R11:
0000000000000006 R12:
ffff88021e524778
R13:
0000000000000001 R14:
ffff88021e524000 R15:
0000000000000000
FS:
00007f90821b7700(0000) GS:
ffff880028220000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007f90818bd010 CR3:
0000000132c64000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process reboot (pid: 5910, threadinfo
ffff880185c9a000, task
ffff88021bf04a80)
Stack:
ffff880185c9bc98 000000018130529d ffff880185c9bcc8 ffff88021e524000
<0>
0000000000000004 ffff88021948c700 0000000000000000 ffff880185c9bda7
<0>
ffff880185c9bce8 ffffffff81305cbd ffff880185c9bce8 ffff88021948c700
Call Trace:
[<
ffffffff81305cbd>] pci_disable_msix+0x3d/0x50
[<
ffffffffa00501d5>] ixgbe_reset_interrupt_capability+0x65/0x90 [ixgbe]
[<
ffffffffa00512f6>] ixgbe_clear_interrupt_scheme+0xb6/0xd0 [ixgbe]
[<
ffffffffa005330b>] __ixgbe_shutdown+0x5b/0x200 [ixgbe]
[<
ffffffffa00534ca>] ixgbe_shutdown+0x1a/0x60 [ixgbe]
[<
ffffffff812f6c7c>] pci_device_shutdown+0x2c/0x50
[<
ffffffff813727fb>] device_shutdown+0x4b/0x160
[<
ffffffff8107d98c>] kernel_restart_prepare+0x2c/0x40
ehci timer_action, mod_timer io_watchdog
[<
ffffffff8107d9e6>] kernel_restart+0x16/0x60
[<
ffffffff8107dbfd>] sys_reboot+0x1ad/0x200
[<
ffffffff811676cf>] ? __d_free+0x3f/0x60
[<
ffffffff81167748>] ? d_free+0x58/0x60
[<
ffffffff8116f7c0>] ? mntput_no_expire+0x30/0x100
[<
ffffffff81152b11>] ? __fput+0x191/0x200
[<
ffffffff816565fe>] ? do_page_fault+0x3e/0xa0
[<
ffffffff8100b132>] system_call_fastpath+0x16/0x1b
Code: 4c 89 ef e8 98 8c e3 ff 4d 39 f4 48 8b 43 10 75 cf 48 83 c4 18 5b 41 5c
41 5d 41 5e 41 5f c9 c3 49 8b 7d 20 e8 07 5a d3 ff eb c9 <0f> 0b 0f 1f 00 eb fb
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
ehci timer_action, mod_timer io_watchdog
RIP [<
ffffffff81305c2b>] free_msi_irqs+0x11b/0x130
RSP <
ffff880185c9bc88>
---[ end trace
27de882a0fe75593 ]---
(This was seen on a pretty old kernel/driver, but looks like
the same bug is still possible.)
Signed-off-by: <akepner@riverbed.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Thu, 18 Apr 2013 19:00:59 +0000 (15:00 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to ixgbe only.
v2- Dropped the following 2 patches from the series:
ixgbe: Support using build_skb in the case that jumbo frames are disabled
ixgbe: walk pci-e bus to find minimum width
Ben Hutchings found a bug with Alex's patch, so that patch was dropped
permanently. Jacob's "walk PCIe bus" patch is being re-worked for
a more generic solution so that other drivers can benefit.
In the remaining patches...
Alex provides a fix where we were incorrectly checking the entire frag_off
field when we only wanted the fragment offset. Alex also cleans up
the check for PAGE_SIZE, since the default configuration allocates 32K
for all buffers.
Emil provides a change to the calculation of eerd so that it is consistent
between the read and write functions by using | instead of +.
Jacob adds support for displaying PCIe Gen3 link speed, which was
previously missing from the ixgbe driver. He also provides a patch
to clean up ixgbe_get_bus_info_generic to call some conversion
functions, which are used also in another patch provided by Jacob.
Jacob modifies the driver to enable certain devices (which have an
internal switch) to read from the physical slot rather than reading
data from the internal switch.
Don provides a couple of fixes (which are more appropriate for net-next),
one of which resolves an issue where ixgbe was only turning on the laser
when the adapter was up which caused issues for those who wanted to
access the MNG firmware while the port was in a down state. The other
fix is for WoL when currently linked at 1G. Lastly Don bumps the driver
version keep the in-kernel driver up to date with the current functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 18 Apr 2013 06:52:51 +0000 (06:52 +0000)]
tcp: introduce TCPSpuriousRtxHostQueues SNMP counter
Host queues (Qdisc + NIC) can hold packets so long that TCP can
eventually retransmit a packet before the first transmit even left
the host.
Its not clear right now if we could avoid this in the first place :
- We could arm RTO timer not at the time we enqueue packets, but
at the time we TX complete them (tcp_wfree())
- Cancel the sending of the new copy of the packet if prior one
is still in queue.
This patch adds instrumentation so that we can at least see how
often this problem happens.
TCPSpuriousRtxHostQueues SNMP counter is incremented every time
we detect the fast clone is not yet freed in tcp_transmit_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Thu, 18 Apr 2013 02:54:39 +0000 (02:54 +0000)]
fec: Remove unneeded asm header files
There is nothing in the driver that requires <asm/coldfire.h> and
<asm/mcfsim.h>.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 6 Apr 2013 13:24:29 +0000 (15:24 +0200)]
netfilter: add my copyright statements
Add copyright statements to all netfilter files which have had significant
changes done by myself in the past.
Some notes:
- nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
Core Team when it got split out of nf_conntrack_core.c. The copyrights
even state a date which lies six years before it was written. It was
written in 2005 by Harald and myself.
- net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
statements. I've added the copyright statement from net/netfilter/core.c,
where this code originated
- for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
it to give the wrong impression
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Don Skidmore [Fri, 1 Mar 2013 07:09:43 +0000 (07:09 +0000)]
ixgbe: bump version number
Bump the version number reflect the corresponding functionality in the
out of tree driver.
Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Thu, 28 Feb 2013 08:08:44 +0000 (08:08 +0000)]
ixgbe: Fix 1G link WoL
We reset during the shutdown path which will reset AUTOC register. This
would change LMS to 10G. If we were currently linked at 1G we will lose
link, which is a bad thing if we wanted WoL to work. For the fix I needed
to know if WoL is supported so I created a new bool in the ixgbe_hw struct.
If this is set we will not allow the reset to change the current LMS value
in AUTOC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Thu, 21 Feb 2013 03:00:04 +0000 (03:00 +0000)]
ixgbe: fix MNG FW support when adapter not up
We were only turning the laser on when the adapter was up. This
causes issues for those who wanted to access the MNG FW while the
port was in a down state. This patch makes sure the laser is turned
on in probe and remain up even after the port is brought down.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 9 Apr 2013 07:20:09 +0000 (07:20 +0000)]
ixgbe: enable devices with internal switch to read pci parent
This patch modifies the driver to enable certain devices, which have an internal
switch, to read data from the physical slot rather than reading data from the
internal switch. The internal switch will always report the same PCI width and
speed, which is not useful compared to knowing the width and speed of the slot
the physical card is plugged into.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 15 Feb 2013 09:18:15 +0000 (09:18 +0000)]
ixgbe: create conversion functions from link_status to bus/speed
This patch cleans up ixgbe_get_bus_info_generic to call some conversion
functions, which are used also in a follow on patch that needs to convert
between the link_status PCIe config values into ixgbe's internal enum
representations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 15 Feb 2013 09:18:10 +0000 (09:18 +0000)]
ixgbe: Enable support for recognizing PCI-e Gen3 link speed
This patch adds support for displaying PCIe Gen3 link speed, which was
previously missing from the driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Sat, 9 Feb 2013 01:19:55 +0000 (01:19 +0000)]
ixgbe: Drop check for PAGE_SIZE from ixgbe_xmit_frame_ring
The check for PAGE_SIZE is pointless now that the default configuration is to
allocate 32K for all buffers. Since the Tx descriptor limit is 16K we can
just drop the check and always compare the descriptors to the maximum size
supported.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 5 Feb 2013 09:43:26 +0000 (09:43 +0000)]
ixgbe: don't do arithmetic operations on bitmasks
Make the calculation of eerd consistent between the read and write functions
by using | instead of + for IXGBE_EEPROM_RW_REG_START
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>