huangdaode [Thu, 17 Sep 2015 06:51:47 +0000 (14:51 +0800)]
net: add Hisilicon Network Subsystem MDIO support
The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon
hip04, hip05 and Hi1610 SoC to control the external PHY
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: Kenneth Lee <liguozhu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Thu, 17 Sep 2015 06:51:46 +0000 (14:51 +0800)]
net: add Hisilicon Network Subsystem support (config and documents)
The Hisilicon Network Subsystem is a long term evolution IP which is
supposed to be used in Hisilicon ICT SoC. The IP, which is called hns
for short, is a TCP/IP acceleration engine, which can directly decode
TCP/IP stream and distribute them to different ring buffers.
HNS can be configured to work on different mode for different scenario.
This patch make use only some of the mode to make it as standard
ethernet NIC. The other mode will be added soon.
The whole function has 4 kernel sub-modules:
hnae: the HNS acceleration engine framework. It provides a abstract
interface between the engine and the upper layers which make use of the
engine by ring buffer.
hns_enet_drv: a standard ethernet driver that base on the ring buffer.
hns_dsaf: one of the implementation of HNS acceleration engine, which is
applied on Hililicon hip05, Hi1610 and other later-on SoCs
hns_mdio: the mdio control to the PHY, used by acceleration engine
This submit add basic config and documents
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: Kenneth Lee <liguozhu@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
chas williams [Wed, 16 Sep 2015 20:28:25 +0000 (16:28 -0400)]
xen-netfront: always set num queues if possible
If netfront connects with two (or more) queues and then reconnects with
only one queue it fails to delete or rewrite the multi-queue-num-queues
key and netback will try to use the wrong number of queues.
Always write the num-queues field if the backend has multi-queue support.
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Sep 2015 23:37:13 +0000 (16:37 -0700)]
sch_dsmark: improve memory locality
Memory placement in sch_dsmark is silly : Better place mask/value
in the same cache line.
Also, we can embed small arrays in the first cache line and
remove a potential cache miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Sep 2015 05:17:14 +0000 (22:17 -0700)]
Merge branch 'bcmgenet-irq-coalesce'
Florian Fainelli says:
====================
net: bcmgenet: Interrupt coalescing
This patch series adds support for interrupt coalescing for GENET
adapters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 16 Sep 2015 23:47:40 +0000 (16:47 -0700)]
net: bcmgenet: Implement RX coalescing control knobs
Add support for the ethtool rx-frames coalescing parameter which allows
defining the number of RX interrupts per frames received. The RDMA
engine supports a configurable timeout with a resolution of
approximately 8.192 us.
We can no longer enable the BDONE/PDONE interrupts as those would
fire for each packet/buffer received, which would defeat the MBDONE
interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 16 Sep 2015 23:47:39 +0000 (16:47 -0700)]
net: bcmgenet: Implement TX coalescing control knobs
Configuring the ethtool tx-frames property, which translates into N
packets before a TX interrupt is the simplest configuration scheme
because it requires no locking neither at the softare nor hardware
level, and is completely indepedent from the link speed. Since ethtool
does not allow per-tx queue coalescing parameters, we apply the same
setting to any transmit queue.
We can no longer enable the BDONE/PDONE interrupts as those would fire
for each packet/buffer received, which would defeat the MBDONE interrupt
purpose. The MBDONE interrupt is guaranteed to correspond to a
PDONE/BDONE interrupt when the threshold is set to 1, but offers
interrupt coalescing when the value is > 1.
Since the HW is configured to generate an interrupt when the ring
becomes emtpy, we have to deny any timeout/timer settings coming from
user-space to indicate we can only generate an interrupt very <N>
packets.
While we are at it, fix the DMA_INTR_THRESHOLD_MASK value which was off
by one bit (0xff vs. 0x1ff).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:41:19 +0000 (23:41 +0000)]
lan78xx: Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR.
Remove not defined MAC_CR_GMII_EN_ bit from MAC_CR.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:41:14 +0000 (23:41 +0000)]
lan78xx: Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.
Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:41:07 +0000 (23:41 +0000)]
lan78xx: Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h
Remove phy defines in lan78xx.h and use defines in include/linux/microchipphy.h.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:40:54 +0000 (23:40 +0000)]
lan78xx: Update to use phylib instead of mii_if_info.
Update to use phylib instead of mii_if_info.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:40:47 +0000 (23:40 +0000)]
lan78xx: Add PHYLIB and MICROCHIP_PHY as default config.
Add PHYLIB and MICROCHIP_PHY as default configuration for lan78xx.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung.Huh@microchip.com [Wed, 16 Sep 2015 23:40:39 +0000 (23:40 +0000)]
lan78xx: Check device ready bit (PMT_CTL_READY_) after reset the PHY
Check device ready bit (PMT_CTL_READY_) after reset the PHY.
Device may not be ready even if PHY_RST_ is cleared depends on configuration.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 16 Sep 2015 16:16:39 +0000 (10:16 -0600)]
net: Initialize table in fib result
Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:
[ 0.877040] BUG: unable to handle kernel NULL pointer dereference at
0000000000000056
[ 0.877597] IP: [<
ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[ 0.877597] PGD
3fa14067 PUD
3fa6e067 PMD 0
[ 0.877597] Oops: 0000 [#1] SMP
[ 0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
[ 0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
[ 0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.877597] task:
ffff88003fab0bc0 ti:
ffff88003faa8000 task.ti:
ffff88003faa8000
[ 0.877597] RIP: 0010:[<
ffffffff8155b5e2>] [<
ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[ 0.877597] RSP: 0018:
ffff88003ed03ba0 EFLAGS:
00010202
[ 0.877597] RAX:
0000000000000046 RBX:
00000000ffffff8f RCX:
0000000000000020
[ 0.877597] RDX:
ffff88003fab50b8 RSI:
0000000000000200 RDI:
ffffffff8152b4b8
[ 0.877597] RBP:
ffff88003ed03c50 R08:
0000000000000000 R09:
0000000000000000
[ 0.877597] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88003fab6f00
[ 0.877597] R13:
ffff88003fab5000 R14:
0000000000000000 R15:
ffffffff81cb5600
[ 0.877597] FS:
00007f6de5751700(0000) GS:
ffff88003ed00000(0000) knlGS:
0000000000000000
[ 0.877597] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 0.877597] CR2:
0000000000000056 CR3:
000000003fa6d000 CR4:
00000000000006e0
[ 0.877597] Stack:
[ 0.877597]
0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
[ 0.877597]
ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
[ 0.877597]
0000000000000000 0000000000000046 0000000000000000 0000000400000000
[ 0.877597] Call Trace:
[ 0.877597] <IRQ>
[ 0.877597] [<
ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
[ 0.877597] [<
ffffffff8158e13c>] arp_process+0x39c/0x690
[ 0.877597] [<
ffffffff8158e57e>] arp_rcv+0x13e/0x170
[ 0.877597] [<
ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
[ 0.877597] [<
ffffffff81515795>] ? __build_skb+0x25/0x100
[ 0.877597] [<
ffffffff81515795>] ? __build_skb+0x25/0x100
[ 0.877597] [<
ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
[ 0.877597] [<
ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
[ 0.877597] [<
ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
[ 0.877597] [<
ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
[ 0.877597] [<
ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
[ 0.877597] [<
ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
[ 0.877597] [<
ffffffff81053228>] __do_softirq+0x98/0x260
[ 0.877597] [<
ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30
The root cause is use of res.table uninitialized.
Thanks to Nikolay for noticing the uninitialized use amongst the maze of
gotos.
As Nikolay pointed out the second initialization is not required to fix
the oops, but rather to fix a related problem where a valid lookup should
be invalidated before creating the rth entry.
Fixes:
b7503e0cdb5d ("net: Add FIB table id to rtable")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Richard Alpe <richard.alpe@ericsson.com>
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Sep 2015 04:09:07 +0000 (21:09 -0700)]
Merge branch 'bpf_avoid_clone'
Alexei Starovoitov says:
====================
bpf: performance improvements
v1->v2: dropped redundant iff_up check in patch 2
At plumbers we discussed different options on how to get rid of skb_clone
from bpf_clone_redirect(), the patch 2 implements the best option.
Patch 1 adds 'integrated exts' to cls_bpf to improve performance by
combining simple actions into bpf classifier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Wed, 16 Sep 2015 06:05:43 +0000 (23:05 -0700)]
bpf: add bpf_redirect() helper
Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.
Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps
and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps
To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps
For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 16 Sep 2015 06:05:42 +0000 (23:05 -0700)]
cls_bpf: introduce integrated actions
Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.
Then more interesting programs like the following are easier to write:
int cls_bpf_prog(struct __sk_buff *skb)
{
/* classify arp, ip, ipv6 into different traffic classes
* and drop all other packets
*/
switch (skb->protocol) {
case htons(ETH_P_ARP):
skb->tc_classid = 1;
break;
case htons(ETH_P_IP):
skb->tc_classid = 2;
break;
case htons(ETH_P_IPV6):
skb->tc_classid = 3;
break;
default:
return TC_ACT_SHOT;
}
return TC_ACT_OK;
}
Joint work with Daniel Borkmann.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Junwei Zhang [Fri, 18 Sep 2015 04:00:05 +0000 (00:00 -0400)]
net: only check perm protocol when register proto
The permanent protocol nodes are at the head of the list,
So only need check all these nodes.
No matter the new node is permanent or not,
insert the new node after the last permanent protocol node,
If the new node conflicts with existing permanent node,
return error.
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Sep 2015 22:24:28 +0000 (15:24 -0700)]
bonding: use l4 hash if available
If skb carries a l4 hash, no need to perform a flow dissection.
Performance is slightly better :
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39012e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39393e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39988e+06
After patch :
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.43579e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44304e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44312e+06
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Sep 2015 22:24:20 +0000 (15:24 -0700)]
tcp: provide skb->hash to synack packets
In commit
b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf
on xmit"), Tom provided a l4 hash to most outgoing TCP packets.
We'd like to provide one as well for SYNACK packets, so that all packets
of a given flow share same txhash, to later enable bonding driver to
also use skb->hash to perform slave selection.
Note that a SYNACK retransmit shuffles the tx hash, as Tom did
in commit
265f94ff54d62 ("net: Recompute sk_txhash on negative routing
advice") for established sockets.
This has nice effect making TCP flows resilient to some kind of black
holes, even at connection establish phase.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Sep 2015 00:18:38 +0000 (17:18 -0700)]
Merge branch 'nf_hook_netns'
Eric W. Biederman says:
====================
Passing net through the netfilter hooks
My primary goal with this patchset and it's follow ups is to cleanup the
network routing paths so that we do not look at the output device to
derive the network namespace. My plan is to pass the network namespace
of the transmitting socket through the output path, to replace code that
looks at the output network device today. Once that is done we can have
routes with output devices outside of the current network namespace.
Which should allow reception and transmission of packets in network
namespaces to be as fast as normal packet reception and transmission
with early demux disabled, because it will same code path.
Once skb_dst(skb)->dev is a little better under control I think it will
also be possible to use rcu to cleanup the ancient hack that sets
dst->dev to loopback_dev when a network device is removed.
The work to get there is a series of code cleanups. I am starting with
passing net into the netfilter hooks and into the functions that are
called after the netfilter hooks. This removes from netfilter the
need to guess which network namespace it is working on.
To get there I perform a series of minor prep patches so the big changes
at the end are possible to audit without getting lost in the noise. In
particular I have a lot of patches computing net into a local variable
and then using it through out the function.
So this patchset encompases removing dead code, sorting out the _sk
functions that were added last time someone pushed a prototype change
through the post netfilter functions. Cleaning up individual functions
use of the network namespace. Passing net into the netfilter hooks.
Passing net into the post netfilter functions. Using state->net in
the netfilter code where it is available and trivially usable.
Pablo, Dave I don't know whose tree this makes more sense to go
through. I am assuming at least initially Pablos as netfilter is
involved. From what I have seen there will be a lot of back and forth
between the netfilter code paths and the routing code paths.
The patches are also available (against 4.3-rc1) at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next.git master
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 17 Sep 2015 22:21:31 +0000 (17:21 -0500)]
netfilter: Add blank lines in callers of netfilter hooks
In code review it was noticed that I had failed to add some blank lines
in places where they are customarily used. Taking a second look at the
code I have to agree blank lines would be nice so I have added them
here.
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:18 +0000 (20:04 -0500)]
netfilter: Pass net into okfn
This is immediately motivated by the bridge code that chains functions that
call into netfilter. Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.
As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.
To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn. For the moment dst_output_okfn
just silently drops the struct net.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:17 +0000 (20:04 -0500)]
netfilter: Use nf_hook_state.net
Instead of saying "net = dev_net(state->in?state->in:state->out)"
just say "state->net". As that information is now availabe,
much less confusing and much less error prone.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:16 +0000 (20:04 -0500)]
netfilter: Pass struct net into the netfilter hooks
Pass a network namespace parameter into the netfilter hooks. At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.
This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".
In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.
The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)". I am documenting them in case something odd
pops up and someone starts trying to track down what happened.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:15 +0000 (20:04 -0500)]
bridge: Add br_netif_receive_skb remove netif_receive_skb_sk
netif_receive_skb_sk is only called once in the bridge code, replace
it with a bridge specific function that calls netif_receive_skb.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:14 +0000 (20:04 -0500)]
bridge: Cache net in br_nf_pre_routing_finish
This is prep work for passing net to the netfilter hooks.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:13 +0000 (20:04 -0500)]
bridge: Pass net into br_nf_push_frag_xmit
When struct net starts being passed through the ipv4 and ipv6 fragment
routines br_nf_push_frag_xmit will need to take a net parameter.
Prepare br_nf_push_frag_xmit before that is needed and introduce
br_nf_push_frag_xmit_sk for the call sites that still need the old
calling conventions.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:12 +0000 (20:04 -0500)]
bridge: Pass net into br_nf_ip_fragment
This is a prep work for passing struct net through ip_do_fragment and
later the netfilter okfn. Doing this independently makes the later
code changes clearer.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:11 +0000 (20:04 -0500)]
ipv6: Compute net once in raw6_send_hdrinc
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:10 +0000 (20:04 -0500)]
ipv6: Cache net in ip6_output
Keep net in a local variable so I can use it in NF_HOOK_COND
when I pass struct net to all of the netfilter hooks.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:09 +0000 (20:04 -0500)]
ipv6: Only compute net once in ip6_finish_output2
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:08 +0000 (20:04 -0500)]
ipv6: Don't recompute net in ip6_rcv
Avoid silly redundant code
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:07 +0000 (20:04 -0500)]
net: Remove dev_queue_xmit_sk
A function with weird arguments that it will never use to accomdate a
netfilter callback prototype is absolutely in the core of the
networking stack. Frankly it does not make sense and it causes a lot
of confusion as to why arguments that are never used are being passed
to the function.
As I am preparing to make a second change to arguments to the okfn even
the names stops making sense.
As I have removed the two callers of this function remove this confusion
from the networking stack.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:06 +0000 (20:04 -0500)]
bridge: Introduce br_send_bpdu_finish
The function dev_queue_xmit_skb_sk is unncessary and very confusing.
Introduce br_send_bpdu_finish to remove the need for dev_queue_xmit_skb_sk,
and have br_send_bpdu_finish call dev_queue_xmit.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:05 +0000 (20:04 -0500)]
arp: Introduce arp_xmit_finish
The function dev_queue_xmit_skb_sk is unncessary and very confusing.
Introduce arp_xmit_finish to remove the need for dev_queue_xmit_skb_sk,
and have arp_xmit_finish call dev_queue_xmit.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:04 +0000 (20:04 -0500)]
ipv6: Only compute net once in ip6mr_forward2_finish
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:03 +0000 (20:04 -0500)]
ipv4: Only compute net once in ipmr_forward_finish
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:02 +0000 (20:04 -0500)]
ipv4: Only compute net once in ip_rcv_finish
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:01 +0000 (20:04 -0500)]
ipv4: Only compute net once in ip_finish_output2
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:04:00 +0000 (20:04 -0500)]
ipv4: Explicitly compute net in ip_fragment
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:59 +0000 (20:03 -0500)]
ipv4: Only compute net once in ip_do_fragment
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:58 +0000 (20:03 -0500)]
ipv4: Don't recompute net in ipmr_queue_xmit
Calling dev_net(dev) for is just silly.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:57 +0000 (20:03 -0500)]
ipv4: Remember the net in ip_output and ip_mc_output
This is a prepatory patch to passing net int the netfilter hooks,
where net will be used again.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:56 +0000 (20:03 -0500)]
ipv4: Compute net once in ip_rcv
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:55 +0000 (20:03 -0500)]
ipv4: Compute net once in ip_forward_finish
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:54 +0000 (20:03 -0500)]
ipv4: Compute net once in ip_forward
Compute struct net from the input device in ip_forward before it is
used.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:53 +0000 (20:03 -0500)]
net: Merge dst_output and dst_output_sk
Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb->sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:52 +0000 (20:03 -0500)]
xfrm: Remove unused afinfo method init_dst
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:51 +0000 (20:03 -0500)]
netfilter: Pass net to nf_hook_thresh
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:50 +0000 (20:03 -0500)]
netfilter: Store net in nf_hook_state
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 16 Sep 2015 01:03:49 +0000 (20:03 -0500)]
netfilter: Remove !CONFIG_NETFITLER definition of nf_hook_thresh
The !CONFIG_NETFILTER definition of nf_hook_thresh calls okfn when
the CONFIG_NETFITLER defintion does not, making it buggy.
As the !CONFIG_NETFILTER defintion of nf_hook_thresh is not used remove
it rather than fix it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Sep 2015 23:50:36 +0000 (16:50 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-09-15
This series contains updates to ixgbe and fm10k.
Don fixes a ixgbe issue by adding checks for systems that do not have
SFP's to avoid incorrectly acting on interrupts that are falsely
interpreted as SFP events.
Alex Williamson adds a fix for ixgbe to disable SR-IOV prior to
unregistering the netdev to avoid issues with guest OS's which do not
support hot-unplug or their hot-unplug is broken.
Alex Duyck update the lowest limit for adaptive interrupt interrupt
moderation to about 12K interrupts per second for ixgbe. This change
increases the performance for ixgbe. Also fixed up fm10k to remove
the optimization that assumed that all fragments would be limited to
page size, since that assumption is incorrect as the TCP allocator can
provide up to a 32K page fragment. Updated fm10k to add the MAC
address to the list of values recorded on driver load. Fixes fm10k
so that we only trigger the data path reset if the fabric is ready to
handle traffic to avoid triggering the reset unless the switch API is
ready for us.
Jacob updates the fm10k driver to disable the service task during
suspend and re-enable it after we resume. If we don't do this, the
device could be UP when you suspend and come back from resume as
DOWN. Also update fm10k to prevent the removal of default VID rules,
and correctly remove the stack layers information of the VLAN, but then
return to forwarding that VID as untagged frames. If we deleted the VID
rules here, we would begin dropping traffic due to VLAN membership
violations. Fixed fm10k to use pcie_get_minimum_link(), which is useful
in cases where we connect to a slot at Gen3, but the slot is behind a bus
which is only connected at Gen2. Updated fm10k to update the netdev
permanent address during reinit instead of up to enable users to
immediately see the new MAC address on the VF even if the device is not
up. Adds the creation of VLAN interfaces on a device, even while the
device is down for fm10k. Fixed an issue where we request the incorrect
MAC/VLAN combinations, and prevents us from accidentally reporting some
frames as VLAN tagged. Provided a couple of trivial fixes for fm10k
to fix code style and typos in code comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thadeu Lima de Souza Cascardo [Tue, 15 Sep 2015 21:28:00 +0000 (18:28 -0300)]
net-sysfs: get_netdev_queue_index() cleanup
Redo commit
ed1acc8cd8c22efa919da8d300bab646e01c2dce.
Commit
822b3b2ebfff8e9b3d006086c527738a7ca00cd0 ("net: Add max rate tx queue
attribute") moved get_netdev_queue_index around, but kept the old version.
Probably because of a reuse of the original patch from before Eric's change to
that function.
Remove one inline keyword, and no need for a loop to find
an index into a table.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Fixes:
822b3b2ebfff ("net: Add max rate tx queue attribute")
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Thu, 10 Sep 2015 19:26:04 +0000 (21:26 +0200)]
net: smc91x: convert pxa dma to dmaengine
Convert the dma transfers to be dmaengine based, now pxa has a dmaengine
slave driver. This makes this driver a bit more PXA agnostic.
The driver was tested on pxa27x (mainstone) and pxa310 (zylonite),
ie. only pxa platforms.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Wed, 16 Sep 2015 18:32:41 +0000 (11:32 -0700)]
net: fix cdc-phonet.c dependency and build error
Fix build error caused by missing Kconfig dependency:
ERROR: "cdc_parse_cdc_header" [drivers/net/usb/cdc-phonet.ko] undefined!
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Wed, 16 Sep 2015 01:10:16 +0000 (11:10 +1000)]
cdc: add header guards
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 24 Jun 2015 20:34:50 +0000 (13:34 -0700)]
fm10k: fix iov_msg_mac_vlan_pf VID checks
The VF will send a message to request multicast addresses with the
default VID. In the current code, if the PF has statically assigned a
VLAN to a VF, then the VF will not get the multicast addresses. Fix up
all of the various VLAN messages to use identical checks (since each
check was different). Also use set as a variable, so that it simplifies
our check for whether VLAN matches the pf_vid.
The new logic will allow set of a VLAN if it is zero, automatically
converting to the default VID. Otherwise it will allow setting the PF
VID, or any VLAN if PF has not statically assigned a VLAN. This is
consistent behavior, and allows VF to request either 0 or the
default_vid without silently failing.
Note that we need the check for zero since VFs might not get the default
VID message in time to actually request non-zero VLANs.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 24 Jun 2015 20:34:49 +0000 (13:34 -0700)]
fm10k: Only trigger data path reset if fabric is up
This change makes it so that we only trigger the data path reset if the
fabric is ready to handle traffic. The general idea is to avoid
triggering the reset unless the switch API is ready for us. Otherwise
we can just postpone the reset until we receive a switch ready
notification.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 24 Jun 2015 20:34:48 +0000 (13:34 -0700)]
fm10k: re-enable VF after a full reset on detection of a Malicious event
Modify behavior of Malicious Driver Detection events. Presently, the
hardware disables the VF queues and re-assigns them to the PF. This
causes the VF in question to continuously Tx hang, because it assumes
that it can transmit over the queues in question. For transient events,
this results in continuous logging of malicious events.
New behavior is to reset the LPORT and VF state, so that the VF will
have to reset and re-enable itself. This does mean that malicious VFs
will possibly be able to continue and attempt malicious events again.
However, it is expected that system administrators will step in and
manually remove or disable the VF in question.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 24 Jun 2015 20:34:47 +0000 (13:34 -0700)]
fm10k: TRIVIAL fix typo in fm10k_netdev.c
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 24 Jun 2015 20:34:46 +0000 (13:34 -0700)]
fm10k: send traffic on default VID to VLAN device if we have one
This patch ensures that VLAN traffic on the default VID will go to the
corresponding VLAN device if it exists. To do this, mask the rx_ring VID
if we have an active VLAN on that VID.
For this to work correctly, we need to update fm10k_process_skb_fields
to correctly mask off the VLAN_PRIO_MASK bits and compare them
separately, otherwise we incorrectly compare the priority bits with the
cleared flag. This also happens to fix a related bug where having
priority bits set causes us to incorrectly classify traffic.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 24 Jun 2015 20:34:44 +0000 (13:34 -0700)]
fm10k: TRIVIAL fix up ordering of __always_unused and style
Fix some style issues in debugfs code, and correct ordering of void and
__always_unused. Technically, the order does not matter, but preferred
style is to put the macro between the type and name.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 24 Jun 2015 20:34:41 +0000 (13:34 -0700)]
fm10k: remove is_slot_appropriate
This function is no longer used now that we have updated fm10k_slot_warn
functionality.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 19 Jun 2015 17:56:10 +0000 (10:56 -0700)]
fm10k: don't store sw_vid at reset
If we store the sw_vid at reset of PF, then we accidentally prevent the
VF from receiving the message to update its default VID. This only
occurs if the VF is created before the PF has come up, which is the
standard way of creating VFs when using the module parameter.
This fixes an issue where we request the incorrect MAC/VLAN
combinations, and prevents us from accidentally reporting some frames as
VLAN tagged.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 19 Jun 2015 17:56:09 +0000 (10:56 -0700)]
fm10k: allow creation of VLAN interfaces even while down
We re-sync upon going up, so there is little reason to worry about not
syncing immediately with switch. This prevents an error that occurs if
you add a VLAN interface while down.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 19 Jun 2015 02:41:10 +0000 (19:41 -0700)]
fm10k: Report MAC address on driver load
This change adds the MAC address to the list of values recorded on driver
load. The MAC address represents the serial number of the unit and allows
us to track the value should a card be replaced in a system.
The log message should now be similar in output to that of ixgbe.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 16 Jun 2015 18:47:12 +0000 (11:47 -0700)]
fm10k: Don't assume page fragments are page size
This change pulls out the optimization that assumed that all fragments
would be limited to page size. That hasn't been the case for some time now
and to assume this is incorrect as the TCP allocator can provide up to a
32K page fragment.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 15 Jun 2015 22:00:56 +0000 (15:00 -0700)]
fm10k: update netdev perm_addr during reinit, instead of at up
Update the netdev permanent address during fm10k_reinit enables the user
to immediately see the new MAC address on the VF even if the device
isn't up. The previous code required that the device by opened before
changes would appear.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 15 Jun 2015 22:00:55 +0000 (15:00 -0700)]
fm10k: update fm10k_slot_warn to use pcie_get_minimum link
This is useful in cases where we connect to a slot at Gen3, but the slot
is behind a bus which only connected at Gen2. This generally only
happens when a PCIe switch is in the sequence of devices, and can be
very confusing when you see slow performance with no obvious cause.
I am aware this patch has a few lines that break 80 characters, but
there does not seem to be a readable way to format them to less than 80
characters. Suggestions welcome.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 15 Jun 2015 22:00:53 +0000 (15:00 -0700)]
fm10k: only prevent removal of default VID rules
This allows us to correctly add a VLAN even if it matches our default
VID. However, we don't want to remove the VID rules once that VLAN is
deleted. Correctly remove the stack layers information of the VLAN, but
then return to forwarding that VID as untagged frames. If we deleted the
VID rules here, we would begin dropping traffic due to VLAN membership
violations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 15 Jun 2015 22:00:51 +0000 (15:00 -0700)]
fm10k: disable service task during suspend
The service task reads some registers as part of its normal routine,
even while the interface is down. Normally this is ok. However, during
suspend we have disabled the PCI device. Due to this, registers will
read in the same way as a surprise-remove event. Disable the service
task while we suspend, and re-enable it after we resume. If we don't do
this, the device could be UP when you suspend and come back from resume
as closed (since fm10k closes the device when it gets a surprise
remove).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 30 Jul 2015 22:19:28 +0000 (15:19 -0700)]
ixgbe: Limit lowest interrupt rate for adaptive interrupt moderation to 12K
This patch updates the lowest limit for adaptive interrupt interrupt
moderation to roughly 12K interrupts per second.
The way I came about reaching 12K as the desired interrupt rate is by
testing with UDP flows. Specifically I had a simple test that ran a
netperf UDP_STREAM test at varying sizes. What I found was as the packet
sizes increased the performance fell steadily behind until we were only
able to receive at ~4Gb/s with a message size of 65507. A bit of digging
found that we were dropping packets for the socket in the network stack,
and looking at things further what I found was I could solve it by increasing
the interrupt rate, or increasing the rmem_default/rmem_max. What I found was
that when the interrupt coalescing resulted in more data being processed
per interrupt than could be stored in the socket buffer we started losing
packets and the performance dropped. So I reached 12K based on the
following math.
rmem_default = 212992
skb->truesize = 2994
212992 / 2994 = 71.14 packets to fill the buffer
packet rate at 1514 packet size is 812744pps
71.14 / 812744 = 87.9us to fill socket buffer
From there it was just a matter of choosing the interrupt rate and
providing a bit of wiggle room which is why I decided to go with 12K
interrupts per second as that uses a value of 84us.
The data below is based on VM to VM over a direct assigned ixgbe interface.
The test run was:
netperf -H <ip> -t UDP_STREAM"
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
Before:
212992 65507 60.00
1100662 0 9613.4 10.89 0.557
212992 60.00 473474 4135.4 11.27 0.576
After:
212992 65507 60.00
1100413 0 9611.2 10.73 0.549
212992 60.00 974132 8508.3 11.69 0.598
Using bare metal the data is similar but not as dramatic as the throughput
increases from about 8.5Gb/s to 9.5Gb/s.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alex Williamson [Wed, 29 Jul 2015 20:38:21 +0000 (14:38 -0600)]
ixgbe: Teardown SR-IOV before unregister_netdev()
When the .remove() callback for a PF is called, SR-IOV support for the
device is disabled, which requires unbinding and removing the VFs.
The VFs may be in-use either by the host kernel or userspace, such as
assigned to a VM through vfio-pci. In this latter case, the VFs may
be removed either by shutting down the VM or hot-unplugging the
devices from the VM. Unfortunately in the case of a Windows 2012 R2
guest, hot-unplug is broken due to the ordering of the PF driver
teardown. Disabling SR-IOV prior to unregister_netdev() avoids this
issue.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 2 Sep 2015 20:47:54 +0000 (13:47 -0700)]
ixgbe: fix issue with SFP events with new X550 devices
Add checks for systems that don't have SFP's to avoid incorrectly
acting on interrupts that are falsely interpreted as SFP events.
This also includes a modified check generating the EICR mask to be
more forward-looking.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sowmini Varadhan [Fri, 11 Sep 2015 20:48:48 +0000 (16:48 -0400)]
rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats
Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().
Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.
This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Sep 2015 20:25:03 +0000 (13:25 -0700)]
cdc: Fix build warning.
In file included from drivers/usb/gadget/function/u_serial.h:16:0,
from drivers/usb/gadget/function/f_acm.c:23:
>> include/linux/usb/cdc.h:47:5: warning: 'struct usb_interface' declared inside parameter list
int buflen);
^
>> include/linux/usb/cdc.h:47:5: warning: its scope is only this definition or declaration, which is probably not what you want
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 10 Sep 2015 00:40:56 +0000 (17:40 -0700)]
mv643xx_eth: Neaten mv643xx_eth_program_multicast_filter
The code around the allocation and loops are a bit obfuscated.
Neaten it by using:
o kcalloc with decimal count and sizeof(u32)
o Decimal loop indexing and i++ not i += 4
o A promiscuous block using a similar style
to the multicast block
o Remove unnecessary variables
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Sep 2015 19:47:47 +0000 (12:47 -0700)]
Merge branch 'xgene-2nd-10gbe-port'
Iyappan Subramanian says:
====================
driver: net: xgene: Enable 2nd 10GbE port on APM X-Gene SoC
This patch adds support for 2nd 10GbE on APM X-Gene SoC
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 8 Sep 2015 22:50:27 +0000 (15:50 -0700)]
dtb: xgene: Add 2nd 10GbE node
Adding the second 10GbE dt node for APM X-Gene SoC device tree
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Tue, 8 Sep 2015 22:50:26 +0000 (15:50 -0700)]
driver: net: xgene: Add support for 2nd 10GbE port
Adding support for the second 10GbE port on APM X-Gene SoC
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 7 Sep 2015 14:05:42 +0000 (16:05 +0200)]
cdc-phonet: use common parser
This moves cdc-phonet to the common parser for CDC users
to reduce code duplication.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 7 Sep 2015 14:05:41 +0000 (16:05 +0200)]
qmi-wwan: use common parser
This moves qmi-wwan to the common parser for CDC user
to reduce code duplication.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 7 Sep 2015 14:05:40 +0000 (16:05 +0200)]
cdc-ether: switch to common CDC parser
This patch uses the common parser to parse extra CDC
headers in order to reduce code duplication.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 7 Sep 2015 14:05:39 +0000 (16:05 +0200)]
cdc-ncm: use common parser
This moves cdc-ncm to the common parser for CDC user
to reduce code duplication.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 7 Sep 2015 14:05:38 +0000 (16:05 +0200)]
CDC: common parser for extra headers
CDC drivers all implement their own parser for the extra headers.
This patch fixes the code duplication introducing a single common
parser in usbnet.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Mon, 7 Sep 2015 09:46:44 +0000 (15:16 +0530)]
drivers: net: cpsw: Add support to drive gpios for ethernet to be functional
In DRA72x EVM, by default slave 1 is connected to the onboard
phy, but slave 2 pins are also muxed with video input module
which is controlled by pcf857x gpio and currently to select slave
0 to connect to phy gpio hogging is used, but with
omap2plus_defconfig the pcf857x gpio is built as module. So when
using NFS on DRA72x EVM, board doesn't boot as gpio hogging do
not set proper gpio state to connect slave 0 to phy as it is
built as module and you do not see any errors for not setting
gpio and just mentions dhcp reply not got.
To solve this issue, introducing "mode-gpios" in DT when gpio
based muxing is required. This will throw a warning when gpio
get fails and returns probe defer. When gpio-pcf857x module is
installed, cpsw probes again and ethernet becomes functional.
Verified this on DRA72x with pcf as module and ramdisk.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Sep 2015 19:04:22 +0000 (12:04 -0700)]
Merge branch 'dsa-mv88e6xxx-ATU'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: improve ATU move operations
This patchset completes the set of available Address Translation Unit
operations.
These Marvell switches have 4 operations to flush or (re)move, all or
only non-static MAC addresses, from the entire set of databases or from
just a particular one.
The first 3 patches introduce a generic _mv88e6xxx_atu_flush_move
function. The 4 remaining patches update a few FID operations in the
driver on setup, when a port join or leave a VLAN, or change state.
This is a step forward improving the hardware bridging support in DSA
and
88E6352-compatible switches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:16 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: remove all MACs when disabling a port
When we're moving a port from Learning or Forwarding state to Disabled
or Blocking or Listening state, remove all non-static MAC addresses
mapped to this port in the entire set of databases, not only one.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:15 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: remove addresses when a port leaves a VLAN
Add a new _mv88e6xxx_atu_move function to prepare the ATU data register
for the move operation. The ports vector will contain the source port
and destination port of the Move operation. If the destination port is
0xF, the MAC addresses mapped to the source port are removed for the
address database(s).
Then add a _mv88e6xxx_atu_remove wrapper to remove the MAC addresses
from a VLAN database that are mapped to a given port, when it leaves it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:14 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: flush all addresses when adding a VLAN
When choosing an address database for a new VLAN, flush every entries,
not only the non-static ones.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:13 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: flush ATU on initial setup
Purge all MAC addresses from the entire set of address databases when
the driver initializes the device.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:12 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: rework ATU Flush operation
These Marvell switches have 4 operations to flush or (re)move, all or
only non-static MAC addresses, from the entire set of databases or from
just a particular one.
The value of the EntryState bits will determine if the operation is
either a Flush (0x0) or a Move (0xF).
When moving entries from one port to another, entries will be removed if
the destination port is 0xF.
This patch renames these operations for consistency, add a new generic
_mv88e6xxx_atu_flush_move function, and change _mv88e6xxx_flush_fid to
use it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:11 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: extract ATU data write access
Other ATU commands need to write the ATU data register. To ease the
introduction of such commands, extract the ATU data write access from
_mv88e6xxx_atu_load to its own function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Sep 2015 18:34:10 +0000 (14:34 -0400)]
net: dsa: mv88e6xxx: extract FID write from ATU command
Not every ATU commands apply to an FID, thus remove the FID writing from
mv88e6xxx_atu_cmd and write it explicitly where needed, in order to ease
introduction of such commands.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 2 Sep 2015 20:58:36 +0000 (13:58 -0700)]
net: Allow user to get table id from route lookup
rt_fill_info which is called for 'route get' requests hardcodes the
table id as RT_TABLE_MAIN which is not correct when multiple tables
are used. Use the newly added table id in the rtable to send back
the correct table similar to what is done for IPv6.
To maintain current ABI a new request flag, RTM_F_LOOKUP_TABLE, is
added to indicate the actual table is wanted versus the hardcoded
response.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 2 Sep 2015 20:58:35 +0000 (13:58 -0700)]
net: Add FIB table id to rtable
Add the FIB table id to rtable to make the information available for
IPv4 as it is for IPv6.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 2 Sep 2015 20:58:34 +0000 (13:58 -0700)]
net: Refactor rtable initialization
All callers to rt_dst_alloc have nearly the same initialization following
a successful allocation. Consolidate it into rt_dst_alloc.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 Sep 2015 16:42:32 +0000 (09:42 -0700)]
Merge tag 'sound-fix-4.3-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes since the last update: the HD-audio quirks
as usual with a USB-audio fix and a trivial fix for the old sparc
driver"
* tag 'sound-fix-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Change internal PCM order
ALSA: hda - Fix white noise on Dell M3800
ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437
ALSA: hda - Enable headphone jack detect on old Fujitsu laptops
ALSA: sparc: amd7930: Fix module autoload for OF platform driver
ALSA: hda - Add some FIXUP quirks for white noise on Dell laptop.
Linus Torvalds [Fri, 11 Sep 2015 16:35:56 +0000 (09:35 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Just a bunch of fixes to squeeze in before -rc1:
- three nouveau regression fixes
- one qxl regression fix
- a bunch of i915 fixes
... and some core displayport/atomic fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/device: enable c800 quirk for tecra w50
drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
drm/nouveau/gr/nv04: fix big endian setting on gr context
drm/qxl: validate monitors config modes
drm/i915: Allow DSI dual link to be configured on any pipe
drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS
drm/i915: Fix CSR MMIO address check
drm/i915: Limit the number of loops for reading a split 64bit register
drm/i915: Fix broken mst get_hw_state.
drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x
uapi/drm/i915_drm.h: fix userspace compilation.
drm/i915: Always mark the object as dirty when used by the GPU
drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed
drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed
drm/dp: Define AUX_RETRY_INTERVAL as 500 us
drm/atomic: Fix bookkeeping with TEST_ONLY, v3.