John Fastabend [Sat, 13 Sep 2014 03:05:27 +0000 (20:05 -0700)]
net: rcu-ify tcf_proto
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.
This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Sat, 13 Sep 2014 03:04:52 +0000 (20:04 -0700)]
net: qdisc: use rcu prefix and silence sparse warnings
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.
Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 11 Sep 2014 13:57:22 +0000 (09:57 -0400)]
sunvnet: Avoid sending superfluous LDC messages.
When sending out a burst of packets across multiple descriptors,
it is sufficient to send one LDC "start" trigger for
the first descriptor, so do not send an LDC "start" for every
pass through vnet_start_xmit. Similarly, it is sufficient to send
one "DRING_STOPPED" trigger for the last dring (and if that
fails, hold off and send the trigger later).
Optimizations to the number of LDC messages helps avoid
filling up the LDC channel with superfluous LDC messages
that risk triggering flow-control on the channel,
and also boosts performance.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subbaraya Sundeep Bhatta [Thu, 11 Sep 2014 09:23:33 +0000 (14:53 +0530)]
net: axienet: remove unnecessary ether_setup after alloc_etherdev
calling ether_setup is redundant since alloc_etherdev calls
it.
Signed-off-by: Subbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Varka Bhadram [Thu, 11 Sep 2014 07:20:50 +0000 (12:50 +0530)]
ethernet: amd: use pr_info_once()
It will use pr_info_one() to print the version info of the
driver in probe function only once. No need to use the static
variable here.
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Wood [Thu, 11 Sep 2014 02:23:18 +0000 (21:23 -0500)]
udp: Fix inverted NAPI_GRO_CB(skb)->flush test
Commit
2abb7cdc0d ("udp: Add support for doing checksum unnecessary
conversion") caused napi_gro_cb structs with the "flush" field zero to
take the "udp_gro_receive" path rather than the "set flush to 1" path
that they would previously take. As a result I saw booting from an NFS
root hang shortly after starting userspace, with "server not
responding" messages.
This change to the handling of "flush == 0" packets appears to be
incidental to the goal of adding new code in the case where
skb_gro_checksum_validate_zero_check() returns zero. Based on that and
the fact that it breaks things, I'm assuming that it is unintentional.
Fixes:
2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Sep 2014 21:51:32 +0000 (17:51 -0400)]
Merge branch 'sock_queue_err_skb'
Alexander Duyck says:
====================
Address reference counting issues with sock_queue_err_skb
After looking over the code for skb_clone_sk after some comments made by
Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the
correct approach in how to handle the sk_refcnt when creating a buffer that
is eventually meant to be returned to the socket via the sock_queue_err_skb
function.
However upon review of other callers I found what I believe to be a
possible reference count issue in the path for handling "wifi ack" packets.
To address this I have applied the same logic that is currently in place so
that the sk_refcnt will be forced to stay at least 1, or we will not
provide an skb to return in the sk_error_queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 10 Sep 2014 22:05:42 +0000 (18:05 -0400)]
mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path
There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.
Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket. After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 10 Sep 2014 22:05:26 +0000 (18:05 -0400)]
skb: Add documentation for skb_clone_sk
This change adds some documentation to the call skb_clone_sk. This is
meant to help clarify the purpose of the function for other developers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sébastien Barré [Wed, 10 Sep 2014 16:20:23 +0000 (18:20 +0200)]
Revert "ipv4: Clarify in docs that accept_local requires rp_filter."
This reverts commit
c801e3cc1925 ("ipv4: Clarify in docs that accept_local requires rp_filter.").
It is not needed anymore since commit
1dced6a85482 ("ipv4: Restore accept_local behaviour in fib_validate_source()").
Suggested-by: Julian Anastasov <ja@ssi.bg>
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 10 Sep 2014 13:01:02 +0000 (15:01 +0200)]
net: bpf: only build bpf_jit_binary_{alloc, free}() when jit selected
Since BPF JIT depends on the availability of module_alloc() and
module_free() helpers (HAVE_BPF_JIT and MODULES), we better build
that code only in case we have BPF_JIT in our config enabled, just
like with other JIT code. Fixes builds for arm/marzen_defconfig
and sh/rsk7269_defconfig.
====================
kernel/built-in.o: In function `bpf_jit_binary_alloc':
/home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc'
kernel/built-in.o: In function `bpf_jit_binary_free':
/home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free'
make: *** [vmlinux] Error 1
====================
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes:
738cbe72adc5 ("net: bpf: consolidate JIT binary allocator")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Sep 2014 21:02:37 +0000 (14:02 -0700)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
cxgb4: Allow FW size upto 1MB, support for S25FL032P flash and misc. fixes
This patch series adds support to allow FW size upto 1MB, support for S25FL032P
flash. Fix t4_flash_erase_sectors to throw an error, when erase sector aren't in
the flash and also warning message when adapters have flashes less than 2Mb.
Adds device id of new adapter and removes device id of debug adapter.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver and cxgb4vf driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 10 Sep 2014 12:14:31 +0000 (17:44 +0530)]
cxgb4/cxgb4vf: Add device ID for new adapter and remove for dbg adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 10 Sep 2014 12:14:30 +0000 (17:44 +0530)]
cxgb4: Add warning msg when attaching to adapters which have FLASHes smaller than 2Mb
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 10 Sep 2014 12:14:29 +0000 (17:44 +0530)]
cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase sectors which aren't in the FLASH
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 10 Sep 2014 12:14:28 +0000 (17:44 +0530)]
cxgb4: Add support to S25FL032P flash
Add support for Spansion S25FL032P flash
Based on original work by Dimitris Michailidis
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 10 Sep 2014 12:14:27 +0000 (17:44 +0530)]
cxgb4: Allow T4/T5 firmware sizes up to 1MB
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Wed, 10 Sep 2014 12:02:50 +0000 (14:02 +0200)]
tipc: fix sparse warnings
This fixes the following sparse warnings:
sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
Also, the function is changed to return bool upon success, rather than a
potentially freed pointer.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Wed, 10 Sep 2014 07:51:13 +0000 (07:51 +0000)]
net: ethernet: arc: Don't free Rockchip resources before disconnect from phy
Free resources before being disconnected from phy and calling core driver is
wrong and should not happen. It avoids a delay of 4-5s caused by the timeout of
phy_disconnect().
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Sep 2014 19:46:32 +0000 (12:46 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
nf-next pull request
The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:
1) use __u8 instead of u_int8_t in arptables header, from
Mike Frysinger.
2) Add support to match by skb->pkttype to the meta expression, from
Ana Rey.
3) Add support to match by cpu to the meta expression, also from
Ana Rey.
4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
Vytas Dauksa.
5) Fix netnet and netportnet hash types the range support for IPv4,
from Sergey Popovich.
6) Fix missing-field-initializer warnings resolved, from Mark Rustad.
7) Dan Carperter reported possible integer overflows in ipset, from
Jozsef Kadlecsick.
8) Filter out accounting objects in nfacct by type, so you can
selectively reset quotas, from Alexey Perevalov.
9) Move specific NAT IPv4 functions to the core so x_tables and
nf_tables can share the same NAT IPv4 engine.
10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.
11) Move specific NAT IPv6 functions to the core so x_tables and
nf_tables can share the same NAT IPv4 engine.
12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.
13) Refactor code to add nft_delrule(), which can be reused in the
enhancement of the NFT_MSG_DELTABLE to remove a table and its
content, from Arturo Borrero.
14) Add a helper function to unregister chain hooks, from
Arturo Borrero.
15) A cleanup to rename to nft_delrule_by_chain for consistency with
the new nft_*() functions, also from Arturo.
16) Add support to match devgroup to the meta expression, from Ana Rey.
17) Reduce stack usage for IPVS socket option, from Julian Anastasov.
18) Remove unnecessary textsearch state initialization in xt_string,
from Bojan Prtvar.
19) Add several helper functions to nf_tables, more work to prepare
the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.
20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
Arturo Borrero.
21) Support NAT flags in the nat expression to indicate the flavour,
eg. random fully, from Arturo.
22) Add missing audit code to ebtables when replacing tables, from
Nicolas Dichtel.
23) Generalize the IPv4 masquerading code to allow its re-use from
nf_tables, from Arturo.
24) Generalize the IPv6 masquerading code, also from Arturo.
25) Add the new masq expression to support IPv4/IPv6 masquerading
from nf_tables, also from Arturo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 10 Sep 2014 04:17:32 +0000 (21:17 -0700)]
netfilter: Convert pr_warning to pr_warn
Use the more common pr_warn.
Other miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 10 Sep 2014 04:17:31 +0000 (21:17 -0700)]
iucv: Convert pr_warning to pr_warn
Use the more common pr_warn.
Coalesce formats.
Realign arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 10 Sep 2014 04:17:30 +0000 (21:17 -0700)]
pktgen: Convert pr_warning to pr_warn
Use the more common pr_warn.
Realign arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 10 Sep 2014 04:17:28 +0000 (21:17 -0700)]
atm: Convert pr_warning to pr_warn
Use the more common pr_warn.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Sep 2014 04:29:50 +0000 (21:29 -0700)]
Merge branch 'ipip_sit_gro'
Tom Herbert says:
====================
net: enable GRO for IPIP and SIT
This patch sets populates the IPIP and SIT offload structures with
gro_receive and gro_complete functions. This enables use of GRO
for these. Also, fixed a problem in IPv6 where we were not properly
initializing flush_id.
Peformance results are below. Note that these tests were done on bnx2x
which doesn't provide RX checksum offload of IPIP or SIT (i.e. does
not give CHEKCSUM_COMPLETE). Also, we don't get 4-tuple hash for RSS
only 2-tuple in this case so all the packets between two hosts are
winding up on the same queue. Net result is the interrupting CPU is
the bottleneck in GRO (checksumming every packet there).
Testing:
netperf TCP_STREAM between two hosts using bnx2x.
* Before fix
IPIP
1 connection
6.53% CPU utilization
6544.71 Mbps
20 connections
13.79% CPU utilization
9284.54 Mbps
SIT
1 connection
6.68% CPU utilization
5653.36 Mbps
20 connections
18.88% CPU utilization
9154.61 Mbps
* After fix
IPIP
1 connection
5.73% CPU utilization
9279.53 Mbps
20 connections
7.14% CPU utilization
7279.35 Mbps
SIT
1 connection
2.95% CPU utilization
9143.36 Mbps
20 connections
7.09% CPU utilization
6255.3 Mbps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Tue, 9 Sep 2014 18:23:16 +0000 (11:23 -0700)]
sit: Add gro callbacks to sit_offload
Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to
support GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Tue, 9 Sep 2014 18:23:15 +0000 (11:23 -0700)]
ipip: Add gro callbacks to ipip offload
Add inet_gro_receive and inet_gro_complete to ipip_offload to
support GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Tue, 9 Sep 2014 18:23:14 +0000 (11:23 -0700)]
ipv6: Clear flush_id to make GRO work
In TCP gro we check flush_id which is derived from the IP identifier.
In IPv4 gro path the flush_id is set with the expectation that every
matched packet increments IP identifier. In IPv6, the flush_id is
never set and thus is uinitialized. What's worse is that in IPv6
over IPv4 encapsulation, the IP identifier is taken from the outer
header which is currently not incremented on every packet for Linux
stack, so GRO in this case never matches packets (identifier is
not increasing).
This patch clears flush_id for every time for a matched packet in
IPv6 gro_receive. We need to do this each time to overwrite the
setting that would be done in IPv4 gro_receive per the outer
header in IPv6 over Ipv4 encapsulation.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 10 Sep 2014 03:27:44 +0000 (20:27 -0700)]
drivers/net: Convert remaining uses of pr_warning to pr_warn
Use the much more common pr_warn instead of pr_warning.
Other miscellanea:
o Typo fixes submiting/submitting
o Coalesce formats
o Realign arguments
o Add missing terminating '\n' to formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 9 Sep 2014 23:08:46 +0000 (01:08 +0200)]
net: use kfree_skb_list() helper in more places
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Sep 2014 15:29:12 +0000 (08:29 -0700)]
ipv4: udp4_gro_complete() is static
net/ipv4/udp_offload.c:339:5: warning: symbol 'udp4_gro_complete' was
not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Fixes:
57c67ff4bd92 ("udp: additional GRO support")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Sep 2014 15:24:53 +0000 (08:24 -0700)]
netns: remove one sparse warning
net/core/net_namespace.c:227:18: warning: incorrect type in argument 1
(different address spaces)
net/core/net_namespace.c:227:18: expected void const *<noident>
net/core/net_namespace.c:227:18: got struct net_generic [noderef]
<asn:4>*gen
We can use rcu_access_pointer() here as read-side access to the pointer
was removed at least one grace period ago.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Sep 2014 15:16:17 +0000 (08:16 -0700)]
ipv6: udp6_gro_complete() is static
net/ipv6/udp_offload.c:159:5: warning: symbol 'udp6_gro_complete' was
not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
57c67ff4bd92 ("udp: additional GRO support")
Cc: Tom Herbert <therbert@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Sep 2014 15:11:41 +0000 (08:11 -0700)]
ipv4: rcu cleanup in ip_ra_control()
Remove one sparse warning :
net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces)
net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>*next
net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain *[assigned] ra
And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 9 Sep 2014 11:07:32 +0000 (13:07 +0200)]
ipv6: mcast: remove dead debugging defines
It's not used anywhere, so just remove these.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Tue, 9 Sep 2014 08:48:29 +0000 (11:48 +0300)]
irda: vlsi_ir: use %*ph specifier
Instead of looping in the code let's use kernel extension to dump small
buffers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Tue, 9 Sep 2014 03:40:28 +0000 (11:40 +0800)]
r8152: use usleep_range
Replace mdelay with usleep_range to avoid busy loop.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Mon, 8 Sep 2014 23:58:58 +0000 (19:58 -0400)]
net-timestamp: optimize sock_tx_timestamp default path
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 8 Sep 2014 21:33:01 +0000 (23:33 +0200)]
net_sched: sfq: remove unused macro
not used anymore since
ddecf0f
(net_sched: sfq: add optional RED on top of SFQ).
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Tue, 9 Sep 2014 21:43:27 +0000 (14:43 -0700)]
sfc: Convert the normal transmit complete path to dev_consume_skb_any()
Convert the normal transmit completion path from dev_kfree_skb_any()
to dev_consume_skb_any() to help keep dropped packet profiling
meaningful.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Sep 2014 00:31:43 +0000 (17:31 -0700)]
Merge branch 'bond_lock_removal'
Nikolay Aleksandrov says:
====================
bonding: get rid of bond->lock
This patch-set removes the last users of bond->lock and converts the places
that needed it for sync to use curr_slave_lock or RCU as appropriate.
I've run this with lockdep and have stress-tested it via loading/unloading
and enslaving/releasing in parallel while outputting bond's proc, I didn't
see any issues. Please pay special attention to the procfs change, I've
done about an hour of stress-testing on it and have checked that the event
that causes the bonding to delete its proc entry (NETDEV_UNREGISTER) is
called before ndo_uninit() and the freeing of the dev so any readers will
sync with that. Also ran sparse checks and there were no splats.
v2: Add patch 0001/cxgb4 bond->lock removal, RTNL should be held in the
notifier call, the other patches are the same. Also tested with
allmodconfig to make sure there're no more users of bond->lock.
Changes from the RFC:
use RCU in procfs instead of RTNL since RTNL might lead to a deadlock with
unloading and also is much slower. The bond destruction syncs with proc
via the proc locks. There's one new patch that converts primary_slave to
use RCU as it was necessary to fix a longstanding bugs in sysfs and
procfs and to make it easy to migrate bond's procfs to RCU. And of course
rebased on top of net-next current.
This is the first patch-set in a series that should simplify the bond's
locking requirements and will make it easier to define the locking
conditions necessary for the various paths. The goal is to rely on RTNL
and rcu alone, an extra lock would be needed in a few special cases that
would be documented very well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:17:03 +0000 (23:17 +0200)]
bonding: remove last users of bond->lock and bond->lock itself
The usage of bond->lock in bond_main.c was completely unnecessary as it
didn't help to sync with anything, most of the spots already had RTNL.
Since there're no more users of bond->lock, remove it.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:17:02 +0000 (23:17 +0200)]
bonding: options: remove bond->lock usage
We're safe to remove the bond->lock use from the arp targets because
arp_rcv_probe no longer acquires bond->lock, only rcu_read_lock.
Also setting the primary slave is safe because noone uses the bond->lock
as a syncing mechanism for that anymore.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:17:01 +0000 (23:17 +0200)]
bonding: procfs: clean bond->lock usage and use RCU
Use RCU to protect against slave release, the proc show function will sync
with the bond destruction by the proc locks and the fact that the bond is
released after NETDEV_UNREGISTER which causes the bonding to remove the
proc entry.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:17:00 +0000 (23:17 +0200)]
bonding: convert primary_slave to use RCU
This is necessary mainly for two bonding call sites: procfs and
sysfs as it was dereferenced without any real protection.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:16:59 +0000 (23:16 +0200)]
bonding: alb: clean bond->lock
We can remove the lock/unlock as it's no longer necessary since
RTNL should be held while calling bond_alb_set_mac_address().
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:16:58 +0000 (23:16 +0200)]
bonding: 3ad: use curr_slave_lock instead of bond->lock
In 3ad mode the only syncing needed by bond->lock is for the wq
and the recv handler, so change them to use curr_slave_lock.
There're no locking dependencies here as 3ad doesn't use
curr_slave_lock at all.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 9 Sep 2014 21:16:57 +0000 (23:16 +0200)]
cxgb4: remove bond->lock
RTNL should be already held in the notifier call so the slave list can
be traversed without a problem, remove the unnecessary bond->lock.
CC: Hariprasad S <hariprasad@chelsio.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Mon, 8 Sep 2014 17:14:50 +0000 (17:14 +0000)]
ARM: dts: Enable emac node on the rk3188-radxarock boards
This enables EMAC Rockchip support on radxa rock boards.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Mon, 8 Sep 2014 17:14:49 +0000 (17:14 +0000)]
ARM: dts: Add emac nodes to the rk3188 device tree
This adds support for EMAC Rockchip driver on RK3188 SoCs.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Mon, 8 Sep 2014 17:14:48 +0000 (17:14 +0000)]
dt-bindings: Document EMAC Rockchip
This adds the necessary binding documentation for the EMAC Rockchip platform
driver found in RK3066 and RK3188 SoCs.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Mon, 8 Sep 2014 17:14:47 +0000 (17:14 +0000)]
ethernet: arc: Add support for Rockchip SoC layer device tree bindings
This patch defines a platform glue layer for Rockchip SoCs which
support arc-emac driver. It ensures that regulator for the rmii is on
before trying to connect to the ethernet controller. It applies right
speed and mode changes to the grf when ethernet settings change.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Sep 2014 23:59:03 +0000 (16:59 -0700)]
Merge branch 'bpf-next'
Daniel Borkmann says:
====================
BPF updates
[ Set applies on top of current net-next but also on top of
Alexei's latest patches. Please see individual patches for
more details. ]
Changelog:
v1->v2:
- Removed paragraph in 1st commit message
- Rest stays the same
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 8 Sep 2014 06:04:49 +0000 (08:04 +0200)]
net: bpf: be friendly to kmemcheck
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.
We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.
As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 8 Sep 2014 06:04:48 +0000 (08:04 +0200)]
net: bpf: arm: address randomize and write protect JIT code
This is the ARM variant for
314beb9bcab ("x86: bpf_jit_comp: secure bpf
jit against spraying attacks").
It is now possible to implement it due to commits
75374ad47c64 ("ARM: mm:
Define set_memory_* functions for ARM") and
dca9aa92fc7c ("ARM: add
DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for
this facility.
Thus, this patch makes sure the BPF generated JIT code is marked RO, as
other kernel text sections, and also lets the generated JIT code start
at a pseudo random offset instead on a page boundary. The holes are filled
with illegal instructions.
JIT tested on armv7hl with BPF test suite.
Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Mircea Gherzan <mgherzan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 8 Sep 2014 06:04:47 +0000 (08:04 +0200)]
net: bpf: consolidate JIT binary allocator
Introduced in commit
314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in
aa2d2c73c21f
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.
Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.
This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.
JIT tested on x86_64 and s390x with BPF test suite.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 8 Sep 2014 15:06:07 +0000 (08:06 -0700)]
tcp: remove dst refcount false sharing for prequeue mode
Alexander Duyck reported high false sharing on dst refcount in tcp stack
when prequeue is used. prequeue is the mechanism used when a thread is
blocked in recvmsg()/read() on a TCP socket, using a blocking model
rather than select()/poll()/epoll() non blocking one.
We already try to use RCU in input path as much as possible, but we were
forced to take a refcount on the dst when skb escaped RCU protected
region. When/if the user thread runs on different cpu, dst_release()
will then touch dst refcount again.
Commit
093162553c33 (tcp: force a dst refcount when prequeue packet)
was an example of a race fix.
It turns out the only remaining usage of skb->dst for a packet stored
in a TCP socket prequeue is IP early demux.
We can add a logic to detect when IP early demux is probably going
to use skb->dst. Because we do an optimistic check rather than duplicate
existing logic, we need to guard inet_sk_rx_dst_set() and
inet6_sk_rx_dst_set() from using a NULL dst.
Many thanks to Alexander for providing a nice bug report, git bisection,
and reproducer.
Tested using Alexander script on a 40Gb NIC, 8 RX queues.
Hosts have 24 cores, 48 hyper threads.
echo 0 >/proc/sys/net/ipv4/tcp_autocorking
for i in `seq 0 47`
do
for j in `seq 0 2`
do
netperf -H $DEST -t TCP_STREAM -l 1000 \
-c -C -T $i,$i -P 0 -- \
-m 64 -s 64K -D &
done
done
Before patch : ~6Mpps and ~95% cpu usage on receiver
After patch : ~9Mpps and ~35% cpu usage on receiver.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Tue, 9 Sep 2014 23:37:11 +0000 (16:37 -0700)]
ath5k: Add missing vmalloc.h include.
After merging the wireless-next tree, today's linux-next build (powerpc
allyesconfig) failed like this:
drivers/net/wireless/ath/ath5k/debug.c: In function 'open_file_eeprom':
drivers/net/wireless/ath/ath5k/debug.c:933:2: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
buf = vmalloc(eesize);
^
drivers/net/wireless/ath/ath5k/debug.c:933:6: warning: assignment makes pointer from integer without a cast
buf = vmalloc(eesize);
^
drivers/net/wireless/ath/ath5k/debug.c:960:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
vfree(buf);
^
Caused by commit
db906eb2101b ("ath5k: added debugfs file for dumping
eeprom"). Also reported by Guenter Roeck.
I have used Geert Uytterhoeven's suggested fix of including vmalloc.h
and so added this patch for today:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 8 Sep 2014 18:39:23 +1000
Subject: [PATCH] ath5k: fix debugfs addition
Reported-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Varka Bhadram [Mon, 8 Sep 2014 03:58:19 +0000 (09:28 +0530)]
ethernet: ti: remove unwanted THIS_MODULE macro
It removes the owner field updation of driver structure.
It will be automatically updated by module_platform_driver()
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Sat, 6 Sep 2014 11:06:11 +0000 (19:06 +0800)]
openvswitch: change the data type of error status to atomic_long_t
Change the date type of error status from u64 to atomic_long_t, and use atomic
operation, then remove the lock which is used to protect the error status.
The operation of atomic maybe faster than spin lock.
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rami Rosen [Sat, 6 Sep 2014 10:08:08 +0000 (13:08 +0300)]
bridge: Cleanup of unncessary check.
This patch removes an unncessary check in the br_afspec() method of
br_netlink.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Sep 2014 18:30:05 +0000 (11:30 -0700)]
Merge branch 'bridge_rtnl_link'
Jiri Pirko says:
====================
bridge: implement rtnl_link options for getting and setting bridge options
So far, only sysfs is complete interface for getting and setting bridge
options. This patchset follows-up on the similar bonding code and
allows userspace to get/set bridge master/port options using Netlink
IFLA_INFO_DATA/IFLA_INFO_SLAVE_DATA attr.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 5 Sep 2014 13:51:32 +0000 (15:51 +0200)]
bridge: implement rtnl_link_ops->changelink
Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 5 Sep 2014 13:51:31 +0000 (15:51 +0200)]
bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info
Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 5 Sep 2014 13:51:30 +0000 (15:51 +0200)]
bridge: implement rtnl_link_ops->slave_changelink
Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 5 Sep 2014 13:51:29 +0000 (15:51 +0200)]
bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info
Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 5 Sep 2014 13:51:28 +0000 (15:51 +0200)]
bridge: switch order of rx_handler reg and upper dev link
The thing is that netdev_master_upper_dev_link calls
call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
link message and during that, rtnl_link_ops->fill_slave_info is called.
But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
yet so there would have to be check for that in fill_slave_info callback.
Resolve this by reordering to similar what bonding and team does to
avoid the check.
Also add removal of IFF_BRIDGE_PORT flag into error path.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Bernat [Fri, 5 Sep 2014 13:09:03 +0000 (15:09 +0200)]
net/ipv4: bind ip_nonlocal_bind to current netns
net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Sep 2014 17:27:22 +0000 (10:27 -0700)]
Merge branch 'ebpf'
Alexei Starovoitov says:
====================
load imm64 insn and uapi/linux/bpf.h
V9->V10
- no changes, added Daniel's ack
Note they're on top of Hannes's patch in the same area [1]
V8 thread with 'why' reasoning and end goal [2]
Original set [3] of ~28 patches I'm planning to present in 4 stages:
I. this 2 patches to fork off llvm upstreaming
II. bpf syscall with manpage and map implementation
III. bpf program load/unload with verifier testsuite (1st user of
instruction macros from bpf.h and 1st user of load imm64 insn)
IV. tracing, etc
[1] http://patchwork.ozlabs.org/patch/385266/
[2] https://lkml.org/lkml/2014/8/27/628
[3] https://lkml.org/lkml/2014/8/26/859
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 5 Sep 2014 05:17:18 +0000 (22:17 -0700)]
net: filter: split filter.h and expose eBPF to user space
allow user space to generate eBPF programs
uapi/linux/bpf.h: eBPF instruction set definition
linux/filter.h: the rest
This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.
These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.
Full eBPF ISA description is in Documentation/networking/filter.txt
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 5 Sep 2014 05:17:17 +0000 (22:17 -0700)]
net: filter: add "load 64-bit immediate" eBPF instruction
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.
x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn
Note that old eBPF programs are binary compatible with new interpreter.
It helps eBPF programs load 64-bit constant into a register with one
instruction instead of using two registers and 4 instructions:
BPF_MOV32_IMM(R1, imm32)
BPF_ALU64_IMM(BPF_LSH, R1, 32)
BPF_MOV32_IMM(R2, imm32)
BPF_ALU64_REG(BPF_OR, R1, R2)
User space generated programs will use this instruction to load constants only.
To tell kernel that user space needs a pointer the _pseudo_ variant of
this instruction may be added later, which will use extra bits of encoding
to indicate what type of pointer user space is asking kernel to provide.
For example 'off' or 'src_reg' fields can be used for such purpose.
src_reg = 1 could mean that user space is asking kernel to validate and
load in-kernel map pointer.
src_reg = 2 could mean that user space needs readonly data section pointer
src_reg = 3 could mean that user space needs a pointer to per-cpu local data
All such future pseudo instructions will not be carrying the actual pointer
as part of the instruction, but rather will be treated as a request to kernel
to provide one. The kernel will verify the request_for_a_pointer, then
will drop _pseudo_ marking and will store actual internal pointer inside
the instruction, so the end result is the interpreter and JITs never
see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
loads 64-bit immediate into a register. User space never operates on direct
pointers and verifier can easily recognize request_for_pointer vs other
instructions.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arturo Borrero [Mon, 8 Sep 2014 11:45:00 +0000 (13:45 +0200)]
netfilter: nf_tables: add new nft_masq expression
The nft_masq expression is intended to perform NAT in the masquerade flavour.
We decided to have the masquerade functionality in a separated expression other
than nft_nat.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Thu, 4 Sep 2014 12:06:49 +0000 (14:06 +0200)]
netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).
The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.
This factorization only involves IPv6; a similar patch exists to
handle IPv4.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Thu, 4 Sep 2014 12:06:33 +0000 (14:06 +0200)]
netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).
The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.
This factorization only involves IPv4; a similar patch follows to
handle IPv6.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Nicolas Dichtel [Mon, 8 Sep 2014 12:11:45 +0000 (14:11 +0200)]
netfilter: ebtables: create audit records for replaces
This is already done for x_tables (family AF_INET and AF_INET6), let's
do it for AF_BRIDGE also.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Thu, 4 Sep 2014 12:06:14 +0000 (14:06 +0200)]
netfilter: nft_nat: include a flag attribute
Both SNAT and DNAT (and the upcoming masquerade) can have additional
configuration parameters, such as port randomization and NAT addressing
persistence. We can cover these scenarios by simply adding a flag
attribute for userspace to fill when needed.
The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h:
NF_NAT_RANGE_MAP_IPS
NF_NAT_RANGE_PROTO_SPECIFIED
NF_NAT_RANGE_PROTO_RANDOM
NF_NAT_RANGE_PERSISTENT
NF_NAT_RANGE_PROTO_RANDOM_FULLY
NF_NAT_RANGE_PROTO_RANDOM_ALL
The caller must take care of not messing up with the flags, as they are
added unconditionally to the final resulting nf_nat_range.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Tue, 2 Sep 2014 14:42:26 +0000 (16:42 +0200)]
netfilter: nf_tables: extend NFT_MSG_DELTABLE to support flushing the ruleset
This patch extend the NFT_MSG_DELTABLE call to support flushing the entire
ruleset.
The options now are:
* No family speficied, no table specified: flush all the ruleset.
* Family specified, no table specified: flush all tables in the AF.
* Family specified, table specified: flush the given table.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Tue, 2 Sep 2014 14:42:25 +0000 (16:42 +0200)]
netfilter: nf_tables: add helpers to schedule objects deletion
This patch refactor the code to schedule objects deletion.
They are useful in follow-up patches.
In order to be able to use these new helper functions in all the code,
they are placed in the top of the file, with all the dependant functions
and symbols.
nft_rule_disactivate_next has been renamed to nft_rule_deactivate.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Bojan Prtvar [Mon, 8 Sep 2014 07:51:12 +0000 (09:51 +0200)]
netfilter: xt_string: Remove unnecessary initialization of struct ts_state
The skb_find_text() accepts uninitialized textsearch state variable.
Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Julian Anastasov [Tue, 2 Sep 2014 21:02:49 +0000 (00:02 +0300)]
ipvs: reduce stack usage for sockopt data
Use union to reserve the required stack space for sockopt data
which is less than the currently hardcoded value of 128.
Now the tables for commands should be more readable.
The checks added for readability are optimized by compiler,
others warn at compile time if command uses too much
stack or exceeds the storage of set_arglen and get_arglen.
As Dan Carpenter points out, we can run for unprivileged user,
so we can silent some error messages.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
CC: Dan Carpenter <dan.carpenter@oracle.com>
CC: Andrey Utkin <andrey.krieger.utkin@gmail.com>
CC: David Binderman <dcb314@hotmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ana Rey [Tue, 2 Sep 2014 18:36:14 +0000 (20:36 +0200)]
netfilter: nf_tables: add devgroup support in meta expresion
Add devgroup support to let us match device group of a packets incoming
or outgoing interface.
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Tue, 2 Sep 2014 14:42:24 +0000 (16:42 +0200)]
netfilter: nf_tables: rename nf_table_delrule_by_chain()
For the sake of homogenize the function naming scheme, let's rename
nf_table_delrule_by_chain() to nft_delrule_by_chain().
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Tue, 2 Sep 2014 14:42:23 +0000 (16:42 +0200)]
netfilter: nf_tables: add helper to unregister chain hooks
This patch adds a helper function to unregister chain hooks in the chain
deletion path. Basically, a code factorization.
The new function is useful in follow-up patches.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arturo Borrero [Tue, 2 Sep 2014 14:42:21 +0000 (16:42 +0200)]
netfilter: nf_tables: refactor rule deletion helper
This helper function always schedule the rule to be removed in the following
transaction.
In follow-up patches, it is interesting to handle separately the logic of rule
activation/disactivation from the transaction mechanism.
So, this patch simply splits the original nf_tables_delrule_one() in two
functions, allowing further control.
While at it, for the sake of homigeneize the function naming scheme, let's
rename nf_tables_delrule_one() to nft_delrule().
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 9 Sep 2014 14:31:09 +0000 (16:31 +0200)]
netfilter: nft_chain_nat_ipv6: use generic IPv6 NAT code from core
Use the exported IPv6 NAT functions that are provided by the core. This
removes duplicated code so iptables and nft use the same NAT codebase.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 25 Aug 2014 10:05:27 +0000 (12:05 +0200)]
netfilter: nat: move specific NAT IPv6 to core
Move the specific NAT IPv6 core functions that are called from the
hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the
ground to allow iptables and nft to use the same NAT engine code that
comes in a follow up patch.
This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in
net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Mon, 8 Sep 2014 23:43:58 +0000 (16:43 -0700)]
Merge tag 'master-2014-09-08' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-09-08
Please pull this batch of updates intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."
For the Bluetooth bits, Gustavo says:
"The changes consists of:
- Coding style fixes to HCI drivers
- Corrupted ack value fix for the H5 HCI driver
- A couple of Enhanced L2CAP fixes
- Conversion of SMP code to use common L2CAP channel API
- Page scan optimizations when using the kernel-side whitelist
- Various mac802154 and and ieee802154 6lowpan cleanups
- One new Atheros USB ID"
For the iwlwifi bits, Emmanuel says:
"We have a new big thing coming up which is called Dynamic Queue
Allocation (or DQA). This is a completely new way to work with the
Tx queues and it requires major refactoring. This is being done by
Johannes and Avri. Besides this, Johannes disables U-APSD by default
because of APs that would disable A-MPDU if the association supports
U-ASPD. Luca contributed to the power area which he was cleaning
up on the way while working on CSA. A few more random things here
and there."
For the Atheros bits, Kalle says:
"For ath6kl we had two small fixes and a new SDIO device id.
For ath10k the bigger changes are:
* support for new firmware version 10.2 (Michal)
* spectral scan support (Simon, Sven & Mathias)
* export a firmware crash dump file (Ben & me)
* cleaning up of pci.c (Michal)
* print pci id in all messages, which causes most of the churn (Michal)"
Beyond that, we have the usual collection of various updates to ath9k,
b43, mwifiex, and wil6210, as well as a few other bits here and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Mon, 8 Sep 2014 23:08:34 +0000 (19:08 -0400)]
inet: remove dead inetpeer sequence code
inetpeer sequence numbers are no longer incremented, so no need to
check and flush the tree. The function that increments the sequence
number was already dead code and removed in in "ipv4: remove unused
function" (
068a6e18). Remove the code that checks for a change, too.
Verifying that v4_seq and v6_seq are never incremented and thus that
flush_check compares bp->flush_seq to 0 is trivial.
The second part of the change removes flush_check completely even
though bp->flush_seq is exactly !0 once, at initialization. This
change is correct because the time this branch is true is when
bp->root == peer_avl_empty_rcu, in which the branch and
inetpeer_invalidate_tree are a NOOP.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Mon, 8 Sep 2014 17:24:02 +0000 (22:54 +0530)]
drivers: net: cpsw: Add support for pause frames
CPSW supports both rx and tx pause frames for flow control.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Mon, 8 Sep 2014 15:31:32 +0000 (08:31 -0700)]
hp100: Convert the normal skb free path to dev_consume_skb_any()
A bit of floor sweeping in a dusty old corner. Convert the "normal"
skb free calls to dev_consume_skb_any() so packet drop tracing will
be more sane.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 8 Sep 2014 15:29:12 +0000 (08:29 -0700)]
net: Fix GRE RX to use skb_transport_header for GRE header offset
GRE assumes that the GRE header is at skb_network_header +
ip_hrdlen(skb). It is more general to use skb_transport_header
and this allows the possbility of inserting additional header
between IP and GRE (which is what we will done in Generic UDP
Encapsulation for GRE).
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 8 Sep 2014 15:25:34 +0000 (11:25 -0400)]
dp83640: Make use of skb_queue_purge instead of reimplementing the code
This change makes it so that dp83640_remove can use skb_queue_purge
instead of looping through itself to flush any entries out of the queue.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Mon, 8 Sep 2014 15:14:56 +0000 (11:14 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless
David S. Miller [Mon, 8 Sep 2014 04:41:53 +0000 (21:41 -0700)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Mon, 8 Sep 2014 03:20:16 +0000 (20:20 -0700)]
Merge branch 'for-3.17-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This pull request includes Alban's patch to disallow '\n' in cgroup
names.
Two other patches from Li to fix a possible oops when cgroup
destruction races against other file operations and one from Vivek to
fix a unified hierarchy devel behavior"
* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: check cgroup liveliness before unbreaking kernfs
cgroup: delay the clearing of cgrp->kn->priv
cgroup: Display legacy cgroup files on default hierarchy
cgroup: reject cgroup names with '\n'
Linus Torvalds [Mon, 8 Sep 2014 03:10:06 +0000 (20:10 -0700)]
Merge branch 'for-3.17-fixes' of git://git./linux/kernel/git/tj/percpu
Pull percpu fixes from Tejun Heo:
"One patch to fix a failure path in the alloc path. The bug is
dangerous but probably not too likely to actually trigger in the wild
given that there hasn't been any report yet.
The other two are low impact fixes"
* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: free percpu allocation info for uniprocessor system
percpu: perform tlb flush after pcpu_map_pages() failure
percpu: fix pcpu_alloc_pages() failure path
Linus Torvalds [Mon, 8 Sep 2014 03:06:44 +0000 (20:06 -0700)]
Merge branch 'for-3.17-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Two patches are to add PCI IDs for ICH9 and all others are device
specific fixes. Nothing too interesting"
* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.
ahci_xgene: Skip the PHY and clock initialization if already configured by the firmware.
ahci: add pcid for Marvel 0x9182 controller
ata: Disabling the async PM for JMicron chip 363/361
ata_piix: Add Device IDs for Intel 9 Series PCH
ahci: Add Device IDs for Intel 9 Series PCH
ata: ahci_tegra: Read calibration fuse
Linus Torvalds [Mon, 8 Sep 2014 02:56:38 +0000 (19:56 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix skb leak in mac802154, from Martin Townsend
2) Use select not depends on NF_NAT for NFT_NAT, from Pablo Neira
Ayuso
3) Fix union initializer bogosity in vxlan, from Gerhard Stenzel
4) Fix RX checksum configuration in stmmac driver, from Giuseppe
CAVALLARO
5) Fix TSO with non-accelerated VLANs in e1000, e1000e, bna, ehea,
i40e, i40evf, mvneta, and qlge, from Vlad Yasevich
6) Fix capability checks in phy_init_eee(), from Giuseppe CAVALLARO
7) Try high order allocations more sanely for SKBs, specifically if a
high order allocation fails, fall back directly to zero order pages
rather than iterating down one order at a time. From Eric Dumazet
8) Fix a memory leak in openvswitch, from Li RongQing
9) amd-xgbe initializes wrong spinlock, from Thomas Lendacky
10) RTNL locking was busted in setsockopt for anycast and multicast, fix
from Sabrina Dubroca
11) Fix peer address refcount leak in ipv6, from Nicolas Dichtel
12) DocBook typo fixes, from Masanari Iida
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
ipv6: restore the behavior of ipv6_sock_ac_drop()
amd-xgbe: Enable interrupts for all management counters
amd-xgbe: Treat certain counter registers as 64 bit
greth: moved TX ring cleaning to NAPI rx poll func
cnic : Cleanup CONFIG_IPV6 & VLAN check
net: treewide: Fix typo found in DocBook/networking.xml
bnx2x: Fix link problems for 1G SFP RJ45 module
3c59x: avoid panic in boomerang_start_xmit when finding page address:
netfilter: add explicit Kconfig for NETFILTER_XT_NAT
ipv6: use addrconf_get_prefix_route() to remove peer addr
ipv6: fix a refcnt leak with peer addr
net-timestamp: only report sw timestamp if reporting bit is set
drivers/net/fddi/skfp/h/skfbi.h: Remove useless PCI_BASE_2ND macros
l2tp: fix race while getting PMTU on PPP pseudo-wire
ipv6: fix rtnl locking in setsockopt for anycast and multicast
VMXNET3: Check for map error in vmxnet3_set_mc
openvswitch: distinguish between the dropped and consumed skb
amd-xgbe: Fix initialization of the wrong spin lock
openvswitch: fix a memory leak
netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG
...
Beniamino Galvani [Fri, 5 Sep 2014 22:28:23 +0000 (00:28 +0200)]
net: phy: mdio-sun4i: don't select REGULATOR
The mdio-sun4i driver automatically selects REGULATOR and
REGULATOR_FIXED_VOLTAGE because it uses the regulator API. But a
driver selecting a subsystem increases the chance of generating
circular Kconfig dependencies, especially when other drivers depend on
the selected symbol.
Since the regulator API functions are replaced with no-ops when
REGULATOR is disabled, the driver can be built successfully even
without regulator support and so those 'select' dependencies can be
safely dropped.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 7 Sep 2014 23:11:10 +0000 (16:11 -0700)]
Merge tag 'master-2014-09-04' of git://git./linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2014-09-05
Please pull this batch of fixes intended for the 3.17 stream...
For the mac80211 bits, Johannes says:
"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.
In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."
For the iwlwifi bits, Emmanuel says:
"I revert a patch that disabled CTS to self in dvm because users
reported issues. The revert is CCed to stable since the offending
patch was sent to stable too. I also bump the firmware API versions
since a new firmware is coming up. On top of that, Marcel fixes a
bug I introduced while fixing a bug in our Kconfig file."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>