GitHub/LineageOS/android_kernel_motorola_exynos9610.git
8 years agoipvs: count pre-established TCP states as active
Michal Kubecek [Fri, 3 Jun 2016 15:56:50 +0000 (17:56 +0200)]
ipvs: count pre-established TCP states as active

Some users observed that "least connection" distribution algorithm doesn't
handle well bursts of TCP connections from reconnecting clients after
a node or network failure.

This is because the algorithm counts active connection as worth 256
inactive ones where for TCP, "active" only means TCP connections in
ESTABLISHED state. In case of a connection burst, new connections are
handled before previous ones have finished the three way handshaking so
that all are still counted as "inactive", i.e. cheap ones. The become
"active" quickly but at that time, all of them are already assigned to one
real server (or few), resulting in highly unbalanced distribution.

Address this by counting the "pre-established" states as "active".

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
8 years agonetfilter: nf_log: fix error on write NONE to logger choice sysctl
Pavel Tikhomirov [Fri, 1 Jul 2016 13:53:54 +0000 (16:53 +0300)]
netfilter: nf_log: fix error on write NONE to logger choice sysctl

It is hard to unbind nf-logger:

  echo NONE > /proc/sys/net/netfilter/nf_log/0
  bash: echo: write error: No such file or directory

  sysctl -w net.netfilter.nf_log.0=NONE
  sysctl: setting key "net.netfilter.nf_log.0": No such file or directory
  net.netfilter.nf_log.0 = NONE

You need explicitly send '\0', for instance like:

  echo -e "NONE\0" > /proc/sys/net/netfilter/nf_log/0

That seem to be strange, so fix it using proc_dostring.

Now it works fine:
   modprobe nfnetlink_log
   echo nfnetlink_log > /proc/sys/net/netfilter/nf_log/0
   cat /proc/sys/net/netfilter/nf_log/0
   nfnetlink_log
   echo NONE > /proc/sys/net/netfilter/nf_log/0
   cat /proc/sys/net/netfilter/nf_log/0
   NONE

v2: add missed error check for proc_dostring

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: Convert FWINV<[foo]> macros and uses to NF_INVF
Joe Perches [Fri, 24 Jun 2016 20:25:22 +0000 (13:25 -0700)]
netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF

netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.

$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))

Consolidate these macros into a single NF_INVF macro.

Miscellanea:

o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: Remove references to obsolete CONFIG_IP_ROUTE_FWMARK
Moritz Sichert [Thu, 30 Jun 2016 09:46:28 +0000 (11:46 +0200)]
netfilter: Remove references to obsolete CONFIG_IP_ROUTE_FWMARK

This option was removed in commit 47dcf0cb1005 ("[NET]: Rethink mark field
in struct flowi").

Signed-off-by: Moritz Sichert <moritz+linux@sichert.me>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agoetherdevice.h & bridge: netfilter: Add and use ether_addr_equal_masked
Joe Perches [Fri, 24 Jun 2016 18:32:26 +0000 (11:32 -0700)]
etherdevice.h & bridge: netfilter: Add and use ether_addr_equal_masked

There are code duplications of a masked ethernet address comparison here
so make it a separate function instead.

Miscellanea:

o Neaten alignment of FWINV macro uses to make it clearer for the reader

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: x_tables: simplify ip{6}table_mangle_hook()
Pablo Neira Ayuso [Fri, 24 Jun 2016 17:48:30 +0000 (19:48 +0200)]
netfilter: x_tables: simplify ip{6}table_mangle_hook()

No need for a special case to handle NF_INET_POST_ROUTING, this is
basically the same handling as for prerouting, input, forward.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: add support for inverted logic in nft_lookup
Arturo Borrero [Thu, 23 Jun 2016 10:24:08 +0000 (12:24 +0200)]
netfilter: nf_tables: add support for inverted logic in nft_lookup

Introduce a new configuration option for this expression, which allows users
to invert the logic of set lookups.

In _init() we will now return EINVAL if NFT_LOOKUP_F_INV is in anyway
related to a map lookup.

The code in the _eval() function has been untangled and updated to sopport the
XOR of options, as we should consider 4 cases:
 * lookup false, invert false -> NFT_BREAK
 * lookup false, invert true -> return w/o NFT_BREAK
 * lookup true, invert false -> return w/o NFT_BREAK
 * lookup true, invert true -> NFT_BREAK

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLED
Pablo Neira Ayuso [Wed, 22 Jun 2016 12:26:33 +0000 (14:26 +0200)]
netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLED

This flag was introduced to restore rulesets from the new netdev
family, but since 5ebe0b0eec9d6f7 ("netfilter: nf_tables: destroy
basechain and rules on netdevice removal") the ruleset is released
once the netdev is gone.

This also removes nft_register_basechain() and
nft_unregister_basechain() since they have no clients anymore after
this rework.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: conntrack: allow increasing bucket size via sysctl too
Florian Westphal [Wed, 22 Jun 2016 11:26:10 +0000 (13:26 +0200)]
netfilter: conntrack: allow increasing bucket size via sysctl too

No need to restrict this to module parameter.

We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.

This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.

We only allow changing this value from the initial net namespace.

Tested using http-client-benchmark vs. httpterm with concurrent

while true;do
 echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
done

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nft_hash: support deletion of inactive elements
Pablo Neira Ayuso [Mon, 20 Jun 2016 22:12:26 +0000 (00:12 +0200)]
netfilter: nft_hash: support deletion of inactive elements

New elements are inactive in the preparation phase, and its
NFT_SET_ELEM_BUSY_MASK flag is set on.

This busy flag doesn't allow us to delete it from the same transaction,
following a sequence like:

begin transaction
add element X
delete element X
end transaction

This sequence is valid and may be triggered by robots. To resolve this
problem, allow deactivating elements that are active in the current
generation (ie. those that has been just added in this batch).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nft_rbtree: check for next generation when deactivating elements
Pablo Neira Ayuso [Mon, 20 Jun 2016 22:12:15 +0000 (00:12 +0200)]
netfilter: nft_rbtree: check for next generation when deactivating elements

set->ops->deactivate() is invoked from nft_del_setelem() that happens
from the transaction path, so we have to check if the object is active
in the next generation, not the current.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: add generation mask to sets
Pablo Neira Ayuso [Sun, 12 Jun 2016 20:52:45 +0000 (22:52 +0200)]
netfilter: nf_tables: add generation mask to sets

Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: add generation mask to chains
Pablo Neira Ayuso [Sun, 12 Jun 2016 17:21:31 +0000 (19:21 +0200)]
netfilter: nf_tables: add generation mask to chains

Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: add generation mask to tables
Pablo Neira Ayuso [Tue, 14 Jun 2016 15:29:18 +0000 (17:29 +0200)]
netfilter: nf_tables: add generation mask to tables

This patch addresses two problems:

1) The netlink dump is inconsistent when interfering with an ongoing
   transaction update for several reasons:

1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should
     be skipping these inactive objects in the dump.

1.b) We perform speculative deletion during the preparation phase, that
     may result in skipping active objects.

1.c) The listing order changes, which generates noise when tracking
     incremental ruleset update via tools like git or our own
     testsuite.

2) We don't allow to add and to update the object in the same batch,
   eg. add table x; add table x { flags dormant\; }.

In order to resolve these problems:

1) If the user requests a deletion, the object becomes inactive in the
   next generation. Then, ignore objects that scheduled to be deleted
   from the lookup path, as they will be effectively removed in the
   next generation.

2) From the get/dump path, if the object is not currently active, we
   skip it.

3) Support 'add X -> update X' sequence from a transaction.

After this update, we obtain a consistent list as long as we stay
in the same generation. The userspace side can detect interferences
through the generation counter so it can restart the dumping.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_tables: add generic macros to check for generation mask
Pablo Neira Ayuso [Sun, 12 Jun 2016 16:07:07 +0000 (18:07 +0200)]
netfilter: nf_tables: add generic macros to check for generation mask

Thus, we can reuse these to check the genmask of any object type, not
only rules. This is required now that tables, chain and sets will get a
generation mask field too in follow up patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: xt_NFLOG: nflog-range does not truncate packets
Vishwanath Pai [Tue, 21 Jun 2016 18:58:46 +0000 (14:58 -0400)]
netfilter: xt_NFLOG: nflog-range does not truncate packets

li->u.ulog.copy_len is currently ignored by the kernel, we should truncate
the packet to either li->u.ulog.copy_len (if set) or copy_range before
sending it to userspace. 0 is a valid input for copy_len, so add a new
flag to indicate whether this was option was specified by the user or not.

Add two flags to indicate whether nflog-size/copy_len was set or not.
XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log

On the userspace side, this was initially represented by the option
nflog-range, this will be replaced by --nflog-size now. --nflog-range would
still exist but does not do anything.

Reported-by: Joe Dollard <jdollard@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_reject_ipv4: don't send tcp RST if the packet is non-TCP
Liping Zhang [Mon, 20 Jun 2016 13:26:28 +0000 (21:26 +0800)]
netfilter: nf_reject_ipv4: don't send tcp RST if the packet is non-TCP

In iptables, if the user add a rule to send tcp RST and specify the
non-TCP protocol, such as UDP, kernel will reject this request. But
in nftables, this validity check only occurs in nft tool, i.e. only
in userspace.

This means that user can add such a rule like follows via nfnetlink:
  "nft add rule filter forward ip protocol udp reject with tcp reset"

This will generate some confusing tcp RST packets. So we should send
tcp RST only when it is TCP packet.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: Allow xt_owner in any user namespace
Eric W. Biederman [Tue, 14 Jun 2016 22:14:12 +0000 (15:14 -0700)]
netfilter: Allow xt_owner in any user namespace

Making this work is a little tricky as it really isn't kosher to
change the xt_owner_match_info in a check function.

Without changing xt_owner_match_info we need to know the user
namespace the uids and gids are specified in.  In the common case
net->user_ns == current_user_ns().  Verify net->user_ns ==
current_user_ns() in owner_check so we can later assume it in
owner_mt.

In owner_check also verify that all of the uids and gids specified are
in net->user_ns and that the expected min/max relationship exists
between the uids and gids in xt_owner_match_info.

In owner_mt get the network namespace from the outgoing socket, as this
must be the same network namespace as the netfilter rules, and use that
network namespace to find the user namespace the uids and gids in
xt_match_owner_info are encoded in.  Then convert from their encoded
from into the kernel internal format for uids and gids and perform the
owner match.

Similar to ping_group_range, this code does not try to detect
noncontiguous UID/GID ranges.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: move zone info into struct nf_conn
Florian Westphal [Sat, 11 Jun 2016 19:57:35 +0000 (21:57 +0200)]
netfilter: move zone info into struct nf_conn

Curently we store zone information as a conntrack extension.
This has one drawback: for every lookup we need to fetch the zone data
from the extension area.

This change place the zone data directly into the main conntrack object
structure and then removes the zone conntrack extension.

The zone data is just 4 bytes, it fits into a padding hole before
the tuplehash info, so we do not even increase the nf_conn structure size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_log: Remove NULL check
Shivani Bhardwaj [Sat, 11 Jun 2016 18:56:10 +0000 (00:26 +0530)]
netfilter: nf_log: Remove NULL check

If 'logger' was NULL, there would be a direct jump to the label 'out',
since it has already been checked for NULL, remove this unnecessary
check.

Signed-off-by: Shivani Bhardwaj <shivanib134@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: make comparision helpers stub functions in ZONES=n case
Florian Westphal [Fri, 10 Jun 2016 21:09:01 +0000 (23:09 +0200)]
netfilter: make comparision helpers stub functions in ZONES=n case

Those comparisions are useless in case of ZONES=n; all conntracks
will reside in the same zone by definition.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: conntrack: align nf_conn on cacheline boundary
Florian Westphal [Fri, 10 Jun 2016 20:25:22 +0000 (22:25 +0200)]
netfilter: conntrack: align nf_conn on cacheline boundary

increases struct size by 32 bytes (288 -> 320), but it is the right thing,
else any attempt to (re-)arrange nf_conn members by cacheline won't work.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: xt_TRACE: add explicitly nf_logger_find_get call
Liping Zhang [Wed, 8 Jun 2016 12:43:19 +0000 (20:43 +0800)]
netfilter: xt_TRACE: add explicitly nf_logger_find_get call

Consider such situation, if nf_log_ipv4 kernel module is not installed,
and the user add a following iptables rule:
  # iptables -t raw -I PREROUTING -j TRACE

There will be no trace log generated until the user install nf_log_ipv4
module manully. So we should add request related nf_log module
appropriately here.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: nf_log: handle NFPROTO_INET properly in nf_logger_[find_get|put]
Liping Zhang [Wed, 8 Jun 2016 12:43:17 +0000 (20:43 +0800)]
netfilter: nf_log: handle NFPROTO_INET properly in nf_logger_[find_get|put]

When we request NFPROTO_INET, it means both NFPROTO_IPV4 and NFPROTO_IPV6.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: x_tables: fix possible ZERO_SIZE_PTR pointer dereferencing error.
Xiubo Li [Thu, 2 Jun 2016 02:59:56 +0000 (10:59 +0800)]
netfilter: x_tables: fix possible ZERO_SIZE_PTR pointer dereferencing error.

Since we cannot make sure that the 'hook_mask' will always be none
zero here. If it equals to zero, the num_hooks will be zero too,
and then kmalloc() will return ZERO_SIZE_PTR, which is (void *)16.

Then the following error check will fails:
  ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
  if (ops == NULL)
          return ERR_PTR(-ENOMEM);

So this patch will fix this with just doing the zero check before
kmalloc() is called.

Maybe the case above will never happen here, but in theory.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: helper: avoid extra expectation iterations on unregister
Florian Westphal [Sun, 15 May 2016 17:50:14 +0000 (19:50 +0200)]
netfilter: helper: avoid extra expectation iterations on unregister

The expectation table is not duplicated per net namespace anymore, so we can move
the expectation table and conntrack table iteration out of the per-net loop.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agobridge: netfilter: checkpatch data type fixes
Tobin C Harding [Tue, 10 May 2016 01:26:57 +0000 (11:26 +1000)]
bridge: netfilter: checkpatch data type fixes

checkpatch produces data type 'checks'.

This patch amends them by changing, for example:
uint8_t -> u8

Signed-off-by: Tobin C Harding <me@tobin.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agoMerge branch 'vrf-local'
David S. Miller [Mon, 6 Jun 2016 22:19:07 +0000 (15:19 -0700)]
Merge branch 'vrf-local'

David Ahern says:

====================
net: vrf: Add support for local traffic to local addresses

Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:

$ ip addr show dev red
33: red: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
    link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.1/32 scope global red
       valid_lft forever preferred_lft forever
    inet6 1111:1::1/128 scope global
       valid_lft forever preferred_lft forever

$ ip addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
    link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
    inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2100:1::1/120 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
       valid_lft forever preferred_lft forever

$ ping -c1 -I red 10.100.1.1
    ping: Warning: source address might be selected on device other than red.
    PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
    64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms

$ ping6 -c1 -I red  2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms

--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms

$ ping6 -c1 -I red 1111::1
PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms

--- 1111::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms

This change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8

$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: ipv6 support for local traffic to local addresses
David Ahern [Thu, 2 Jun 2016 20:15:12 +0000 (13:15 -0700)]
net: vrf: ipv6 support for local traffic to local addresses

Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:

    $ ip addr show dev eth1
    4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
        link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
        inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet6 2100:1::1/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
           valid_lft forever preferred_lft forever

    $ ping6 -c1 -I red 2100:1::1
    ping6: Warning: source address might be selected on device other than red.
    PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
    64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms

ip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: ipv4 support for local traffic to local addresses
David Ahern [Thu, 2 Jun 2016 20:15:11 +0000 (13:15 -0700)]
net: vrf: ipv4 support for local traffic to local addresses

Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.

With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:

    $ ip addr show dev eth1
    4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
        link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
        inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet6 2100:1::1/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
           valid_lft forever preferred_lft forever

    $ ping -c1 -I red 10.100.1.1
    ping: Warning: source address might be selected on device other than red.
    PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
    64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

This patch also enables use of IPv4 loopback address on the VRF device:
    $ ip addr add dev red 127.0.0.1/8

    $ ping -c1 -I red 127.0.0.1
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: Minor refactoring for local address patches
David Ahern [Thu, 2 Jun 2016 20:15:10 +0000 (13:15 -0700)]
net: vrf: Minor refactoring for local address patches

Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions. If the packet is destined to a local
address the header is retained since the packet is sent back to netif_rx.

Collapse vrf_send_v4_prep into vrf_process_v4_outbound.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'hv_netvsc-cleanups'
David S. Miller [Mon, 6 Jun 2016 03:16:50 +0000 (23:16 -0400)]
Merge branch 'hv_netvsc-cleanups'

Vitaly Kuznetsov says:

====================
hv_netvsc: cleanup after untangling the pointer mess

Changes since v1:
- resend when net-next is open [David Miller]
- rebased to current net-next.

After we made traveling through our internal structures explicit it became
obvious that some functions take arguments they don't need just to do
redundant pointer travel and get to what they really need while their
callers already have the required information.

This is just a cleanup series with no functional changes intended. It
doesn't pretend to be complete, additional cleanup of other functions may
follow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: pass struct net_device to rndis_filter_set_offload_params()
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:02 +0000 (17:51 +0200)]
hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: pass struct net_device to rndis_filter_set_device_mac()
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:01 +0000 (17:51 +0200)]
hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()

We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}()
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:00 +0000 (17:51 +0200)]
hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}()

Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:50:59 +0000 (17:50 +0200)]
hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers

Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: remove redundant assignment in netvsc_recv_callback()
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:50:58 +0000 (17:50 +0200)]
hv_netvsc: remove redundant assignment in netvsc_recv_callback()

net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: disable fragment reassembly if high_thresh is zero
Michal Kubeček [Fri, 27 May 2016 15:53:52 +0000 (17:53 +0200)]
net: disable fragment reassembly if high_thresh is zero

Before commit 6d7b857d541e ("net: use lib/percpu_counter API for
fragmentation mem accounting"), setting the reassembly high threshold
to 0 prevented fragment reassembly as first fragment would be always
evicted before second could be added to the queue. While inefficient,
some users apparently relied on this method.

Since the commit mentioned above, a percpu counter is used for
reassembly memory accounting and high batch size avoids taking slow path
in most common scenarios. As a result, a whole full sized packet can be
reassembled without the percpu counter's main counter changing its value
so that even with high_thresh set to 0, fragmented packets can be still
reassembled and processed.

Add explicit check preventing reassembly if high threshold is zero.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'hns-acpi'
David S. Miller [Sun, 5 Jun 2016 01:32:46 +0000 (21:32 -0400)]
Merge branch 'hns-acpi'

Kejian Yan says:

====================
net: hns: add support of ACPI

This series adds HNS support of acpi. The routine will call some ACPI
helper functions, like acpi_dev_found() and acpi_evaluate_dsm(), which
are not included in other cases. In order to make system compile
successfully in other cases except ACPI, it needs to add relative stub
functions to linux/acpi.h. And we use device property functions instead
of serial helper functions to suport both DT and ACPI cases. And then
add the supports of ACPI for HNS.

change log:
 v3->v4:
  mii-id gets from dev-name instead of address

 v2->v3:
 1. add Review-by: Andy Shevchenko
 2. fix the potential memory leak

 v1 -> v2:
 1. use acpi_dev_found() instead of acpi_match_device_ids() to check if
it is a acpi node.
 2. use is_of_node() instead of IS_ENABLED() to check if it is a DT node.
 3. split the patch("add support of acpi for hns-mdio") into two patches:
    3.1 Move to use fwnode_handle
    3.2 Add ACPI
 4. add the patch which subject is dsaf misc operation method
 5. fix the comments by Andy Shevchenko
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: net: hns: enet adds support of acpi
Kejian Yan [Fri, 3 Jun 2016 02:55:21 +0000 (10:55 +0800)]
net: hns: net: hns: enet adds support of acpi

Enet needs to get configration parameter by acpi. This patch
adds support of ACPI for enet. The configuration parameter will
be configed in BIOS.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: implement the miscellaneous operation by asl
Kejian Yan [Fri, 3 Jun 2016 02:55:20 +0000 (10:55 +0800)]
net: hns: implement the miscellaneous operation by asl

The miscellaneous operation is implemented in BIOS, the kernel can call
_DSM method help to call the implementation in ACPI case. Here is a patch
to do that.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: register phy device in each mac initial sequence
Kejian Yan [Fri, 3 Jun 2016 02:55:19 +0000 (10:55 +0800)]
net: hns: register phy device in each mac initial sequence

In ACPI case, there is no interface to register phy device to mdio-bus.
Phy device has to be registered itself to mdio-bus, and then enet can
get the phy device's info so that it can config the phy-device to help
to trasmit and receive data.
HNS hardware topology is as below. The MDIO controller may control several
PHY-devices, and each PHY-device connects to a MAC device. PHY-devices
will register when each mac find PHY device in initial sequence.

                       cpu
                        |
                        |
     -------------------------------------------
    |                   |                       |
    |                   |                       |
    |                  dsaf                     |
   MDIO                 |                      MDIO
    |      ---------------------------          |
    |     |         |         |       |         |
    |     |         |         |       |         |
    |    MAC       MAC       MAC     MAC        |
    |     |         |         |       |         |
     ---- |-------- |-------- |       | --------
         ||        ||        ||       ||
         PHY       PHY       PHY     PHY

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: dsaf adds support of acpi
Kejian Yan [Fri, 3 Jun 2016 02:55:18 +0000 (10:55 +0800)]
net: hns: dsaf adds support of acpi

Dsaf needs to get configuration parameter by ACPI, so this patch add
support of ACPI.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: add dsaf misc operation method
Kejian Yan [Fri, 3 Jun 2016 02:55:17 +0000 (10:55 +0800)]
net: hns: add dsaf misc operation method

The misc operation for different hw platform may be different, if using
current implementation, it will add a new branch on each function for
every new hw platform, so we add a method for this operation.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: add uniform interface for phy connection
Kejian Yan [Fri, 3 Jun 2016 02:55:16 +0000 (10:55 +0800)]
net: hns: add uniform interface for phy connection

As device_node is only used by DT case, HNS needs to treat the other
cases including ACPI. It needs to use uniform ways to handle both of
DT and ACPI. This patch chooses phy_device, and of_phy_connect and
of_phy_attach are only used by DT case. It needs to use uniform interface
to handle that sequence by both DT and ACPI.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: enet specify a reference to dsaf by fwnode_handle
Kejian Yan [Fri, 3 Jun 2016 02:55:15 +0000 (10:55 +0800)]
net: hns: enet specify a reference to dsaf by fwnode_handle

As device_node is only used by DT case, it is expected to find uniform
ways. So fwnode_handle is the suitable method.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: use platform_get_irq instead of irq_of_parse_and_map
Kejian Yan [Fri, 3 Jun 2016 02:55:14 +0000 (10:55 +0800)]
net: hns: use platform_get_irq instead of irq_of_parse_and_map

As irq_of_parse_and_map is only used by DT case, it is excepted to use
a uniform interface. So it is used platform_get_irq() instead.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: use device_* APIs instead of of_* APIs
Kejian Yan [Fri, 3 Jun 2016 02:55:13 +0000 (10:55 +0800)]
net: hns: use device_* APIs instead of of_* APIs

OF series functions can be used only for DT case. Use unified
device property function instead to support both DT and ACPI.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hisilicon: add support of acpi for hns-mdio
Kejian Yan [Fri, 3 Jun 2016 02:55:12 +0000 (10:55 +0800)]
net: hisilicon: add support of acpi for hns-mdio

hns-mdio needs to register itself to mii-bus. The info of the device can
be read by both DT and ACPI.
HNS tries to call Linux PHY driver to help access PHY-devices, the HNS
hardware topology is as below. The MDIO controller may control several
PHY-devices, and each PHY-device connects to a MAC device. The MDIO will
be registered to mdiobus, then PHY-devices will register when each mac
find PHY device.
                       cpu
                        |
                        |
     -------------------------------------------
    |                   |                       |
    |                   |                       |
    |                  dsaf                     |
   MDIO                 |                      MDIO
    |      ---------------------------          |
    |     |         |         |       |         |
    |     |         |         |       |         |
    |    MAC       MAC       MAC     MAC        |
    |     |         |         |       |         |
     ---- |-------- |-------- |       | --------
         ||        ||        ||       ||
         PHY       PHY       PHY     PHY

And the driver can handle reset sequence by _RST method in DSDT in ACPI
case.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hisilicon: cleanup to prepare for other cases
Kejian Yan [Fri, 3 Jun 2016 02:55:11 +0000 (10:55 +0800)]
net: hisilicon: cleanup to prepare for other cases

Hns-mdio only supports DT case now. do some cleanup to prepare
for introducing other cases later, no functional change.

Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoACPI: bus: add stub acpi_evaluate_dsm() to linux/acpi.h
Kejian Yan [Fri, 3 Jun 2016 02:55:10 +0000 (10:55 +0800)]
ACPI: bus: add stub acpi_evaluate_dsm() to linux/acpi.h

acpi_evaluate_dsm() will be used to handle the _DSM method in ACPI case.
It will be compiled in non-ACPI case, but the function is in acpi_bus.h
and acpi_bus.h can only be used in ACPI case, so this patch add the stub
function to linux/acpi.h to make compiled successfully in non-ACPI cases.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoACPI: bus: add stub acpi_dev_found() to linux/acpi.h
Kejian Yan [Fri, 3 Jun 2016 02:55:09 +0000 (10:55 +0800)]
ACPI: bus: add stub acpi_dev_found() to linux/acpi.h

acpi_dev_found() will be used to detect if a given ACPI device is in the
system. It will be compiled in non-ACPI case, but the function is in
acpi_bus.h and acpi_bus.h can only be used in ACPI case, so this patch add
the stub function to linux/acpi.h to make compiled successfully in
non-ACPI cases.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'dsa-new-binding'
David S. Miller [Sat, 4 Jun 2016 21:29:55 +0000 (14:29 -0700)]
Merge branch 'dsa-new-binding'

Andrew Lunn says:

====================
New DSA bind, switches as devices

The interesting patches here are the last three. They implement a new
binding for DSA, which removes a few limitations of the current DSA
binding. In particular, it allows switches to be true Linux devices.
These devices can be on any type of bus, unlike the old DSA binding
which assumes MDIO. See the commit log for more details. The second to
last patch modifies an existing boards device tree to use the new
binding, giving a good example of how switches can be true MDIO
devices. The last patch documents the new binding.

Thanks go to Florian and Vivien for reviewing, testing and bug fixing
these patches.

Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Since V1:

* Add lots of reviewed-by's
* Fix rtable comment
* dsa2: Clear cpu port mask in dsa_cpu_port_unapply()
* dsa2: Only set dsa_port_mask when port successfully configured
* dsa: clear {dsa|cpu}_port_mask on destroy

Since RFC:

* Split the mv88e6xxx MDIO refactor into a rename patch and a refactor
  patch.
* Extend commit message with comment about wrong of_node_put()
* Fix destroy of cpu and dsa ports.
* Rename _DSA_TAG_LAST to DSA_TAG_LAST and add a comment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Document new binding
Andrew Lunn [Sat, 4 Jun 2016 19:17:09 +0000 (21:17 +0200)]
net: dsa: Document new binding

Add the new binding to the documentation of the existing binding.
Mark the old binding as deprecated.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoarm: dt: vf610-zii-devel-b: Make use of new DSA binding
Andrew Lunn [Sat, 4 Jun 2016 19:17:08 +0000 (21:17 +0200)]
arm: dt: vf610-zii-devel-b: Make use of new DSA binding

Hang the three switches of the three MDIO busses using the new DSA
binding. Also, make use of the mdio-bus and explicitly list the phys
on one device. This is not required, but good for testing.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Add new binding implementation
Andrew Lunn [Sat, 4 Jun 2016 19:17:07 +0000 (21:17 +0200)]
net: dsa: Add new binding implementation

The existing DSA binding has a number of limitations and problems. The
main problem is that it cannot represent a switch as a linux device,
hanging off some bus. It is limited to one CPU port. The DSA platform
device is artificial, and does not really represent hardware.

Implement a new binding which can be embedded into any type of node on
a bus to represent one switch device, and its links to other switches.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: Refactor MDIO so driver registers mdio bus
Andrew Lunn [Sat, 4 Jun 2016 19:17:06 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Refactor MDIO so driver registers mdio bus

Have the switch driver register its own MDIO bus. This allows for an
mdio property in the device tree, with child nodes for phys, which
can be referenced via phandles, etc.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: Rename _phy_ to _mdio_
Andrew Lunn [Sat, 4 Jun 2016 19:17:05 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Rename _phy_ to _mdio_

The switch implements a generic MDIO bus, which could host more than
PHYs. It is conventional to use _mdio_ or _mii_ in the function name,
so rename them. Also postfix make the historically first read/write
function with _direct, to help distinguish it from _indirect and _ppu.

While touching these functions, remove some of the _ prefixes, which
we are deprecating.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Make mdio bus optional
Andrew Lunn [Sat, 4 Jun 2016 19:17:04 +0000 (21:17 +0200)]
net: dsa: Make mdio bus optional

The switch may want to instantiate its own MDIO bus. Only do it
centrally if the switch has not already created one, and the read op
is implemented.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Refactor selection of tag ops into a function
Andrew Lunn [Sat, 4 Jun 2016 19:17:03 +0000 (21:17 +0200)]
net: dsa: Refactor selection of tag ops into a function

Replace the two switch statements with an array lookup, and store the
result in the dsa tree structure. The drivers no longer need to know
the selected tag protocol, so remove it from the dsa switch structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: Only support EDSA tagging
Andrew Lunn [Sat, 4 Jun 2016 19:17:02 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Only support EDSA tagging

The merged driver no longer offers the option to use DSA tagging. So
remove the code to setup the switch to do DSA tagging and hard code
the use of EDSA.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>y
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Split up creating/destroying of DSA and CPU ports
Andrew Lunn [Sat, 4 Jun 2016 19:17:01 +0000 (21:17 +0200)]
net: dsa: Split up creating/destroying of DSA and CPU ports

Refactor the code to setup a single DSA/CPU port into a function of
its own, and export it, so it can be used by the new binding.

Similarly, refactor the destroy code into a function.  When destroying
the ports, don't put the of node. They should be released at the end
along with the normal ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Copy the routing table into the switch structure
Andrew Lunn [Sat, 4 Jun 2016 19:17:00 +0000 (21:17 +0200)]
net: dsa: Copy the routing table into the switch structure

The new binding will not have a chip data structure, it will place the
routing directly into the switch structure. To enable backwards
compatibility, copy the routing from the chip data into the switch
structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Remove dynamic allocate of routing table
Andrew Lunn [Sat, 4 Jun 2016 19:16:59 +0000 (21:16 +0200)]
net: dsa: Remove dynamic allocate of routing table

With a maximum of four switches, the size of the routing table is the
same as the pointer to it. Removing it makes the code simpler.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Move port device node into port structure
Andrew Lunn [Sat, 4 Jun 2016 19:16:58 +0000 (21:16 +0200)]
net: dsa: Move port device node into port structure

Move the port device node structure into the port structure, from the
chip data. This information is needed in the next step of implementing
the new binding.

The chip data structure is used while parsing the whole old binding,
before the individual switch structures exist. With the new bindings,
this is reversed, the switches exist first, and the interconnections
between the switches is derived from the individual switch
bindings. Thus this chip data structure becomes unneeded.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
eviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: Add a ports structure and use it in the switch structure
Andrew Lunn [Sat, 4 Jun 2016 19:16:57 +0000 (21:16 +0200)]
net: dsa: Add a ports structure and use it in the switch structure

There are going to be more per-port members added to the switch
structure. So add a port structure and move the netdev into it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: tag_{e}dsa.c: Remove dependency on platform data
Andrew Lunn [Sat, 4 Jun 2016 19:16:56 +0000 (21:16 +0200)]
net: dsa: tag_{e}dsa.c: Remove dependency on platform data

The platform data nr_chips is used when validating a received packet,
to ensure it comes from a know switch chip. The number of possible
switches is limited to DSA_MAX_SWITCHES, so use this as the first
validation step. The new binding allows holes in the dst->ds[] array,
so also ensure ensure there is a valid dsa_switch for this packet.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: slave: Remove MDIO address from switch MDIO bus name
Andrew Lunn [Sat, 4 Jun 2016 19:16:55 +0000 (21:16 +0200)]
net: dsa: slave: Remove MDIO address from switch MDIO bus name

The DSA layer should no longer assume the switch is connected to an
MDIO bus. As a result, we cannot use the address on the MDIO bus when
forming the name of the switches internal MDIO bus for its builtin and
possibly external PHYs. The switch index is sufficient to make the
name unique, so drop the MDIO address.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: fix circular lock in PPU work
Vivien Didelot [Sat, 4 Jun 2016 19:16:54 +0000 (21:16 +0200)]
net: dsa: mv88e6xxx: fix circular lock in PPU work

Lock debugging shows that there is a possible circular lock in the PPU
work code. Switch the lock order of smi_mutex and ppu_mutex to fix this.

Here's the full trace:

    [    4.341325] ======================================================
    [    4.347519] [ INFO: possible circular locking dependency detected ]
    [    4.353800] 4.6.0 #4 Not tainted
    [    4.357039] -------------------------------------------------------
    [    4.363315] kworker/0:1/328 is trying to acquire lock:
    [    4.368463]  (&ps->smi_mutex){+.+.+.}, at: [<8049c758>] mv88e6xxx_reg_read+0x30/0x54
    [    4.376313]
    [    4.376313] but task is already holding lock:
    [    4.382160]  (&ps->ppu_mutex){+.+...}, at: [<8049cac0>] mv88e6xxx_ppu_reenable_work+0x28/0xd4
    [    4.390772]
    [    4.390772] which lock already depends on the new lock.
    [    4.390772]
    [    4.398963]
    [    4.398963] the existing dependency chain (in reverse order) is:
    [    4.406461]
    [    4.406461] -> #1 (&ps->ppu_mutex){+.+...}:
    [    4.410897]        [<806d86bc>] mutex_lock_nested+0x54/0x360
    [    4.416606]        [<8049a800>] mv88e6xxx_ppu_access_get+0x28/0x100
    [    4.422906]        [<8049b778>] mv88e6xxx_phy_read+0x90/0xdc
    [    4.428599]        [<806a4534>] dsa_slave_phy_read+0x3c/0x40
    [    4.434300]        [<804943ec>] mdiobus_read+0x68/0x80
    [    4.439481]        [<804939d4>] get_phy_device+0x58/0x1d8
    [    4.444914]        [<80493ed0>] mdiobus_scan+0x24/0xf4
    [    4.450078]        [<8049409c>] __mdiobus_register+0xfc/0x1ac
    [    4.455857]        [<806a40b0>] dsa_probe+0x860/0xca8
    [    4.460934]        [<8043246c>] platform_drv_probe+0x5c/0xc0
    [    4.466627]        [<804305a0>] driver_probe_device+0x118/0x450
    [    4.472589]        [<80430b00>] __device_attach_driver+0xac/0x128
    [    4.478724]        [<8042e350>] bus_for_each_drv+0x74/0xa8
    [    4.484235]        [<804302d8>] __device_attach+0xc4/0x154
    [    4.489755]        [<80430cec>] device_initial_probe+0x1c/0x20
    [    4.495612]        [<8042f620>] bus_probe_device+0x98/0xa0
    [    4.501123]        [<8042fbd0>] deferred_probe_work_func+0x4c/0xd4
    [    4.507328]        [<8013a794>] process_one_work+0x1a8/0x604
    [    4.513030]        [<8013ac54>] worker_thread+0x64/0x528
    [    4.518367]        [<801409e8>] kthread+0xec/0x100
    [    4.523201]        [<80108f30>] ret_from_fork+0x14/0x24
    [    4.528462]
    [    4.528462] -> #0 (&ps->smi_mutex){+.+.+.}:
    [    4.532895]        [<8015ad5c>] lock_acquire+0xb4/0x1dc
    [    4.538154]        [<806d86bc>] mutex_lock_nested+0x54/0x360
    [    4.543856]        [<8049c758>] mv88e6xxx_reg_read+0x30/0x54
    [    4.549549]        [<8049cad8>] mv88e6xxx_ppu_reenable_work+0x40/0xd4
    [    4.556022]        [<8013a794>] process_one_work+0x1a8/0x604
    [    4.561707]        [<8013ac54>] worker_thread+0x64/0x528
    [    4.567053]        [<801409e8>] kthread+0xec/0x100
    [    4.571878]        [<80108f30>] ret_from_fork+0x14/0x24
    [    4.577139]
    [    4.577139] other info that might help us debug this:
    [    4.577139]
    [    4.585159]  Possible unsafe locking scenario:
    [    4.585159]
    [    4.591093]        CPU0                    CPU1
    [    4.595631]        ----                    ----
    [    4.600169]   lock(&ps->ppu_mutex);
    [    4.603693]                                lock(&ps->smi_mutex);
    [    4.609742]                                lock(&ps->ppu_mutex);
    [    4.615790]   lock(&ps->smi_mutex);
    [    4.619314]
    [    4.619314]  *** DEADLOCK ***
    [    4.619314]
    [    4.625256] 3 locks held by kworker/0:1/328:
    [    4.629537]  #0:  ("events"){.+.+..}, at: [<8013a704>] process_one_work+0x118/0x604
    [    4.637288]  #1:  ((&ps->ppu_work)){+.+...}, at: [<8013a704>] process_one_work+0x118/0x604
    [    4.645653]  #2:  (&ps->ppu_mutex){+.+...}, at: [<8049cac0>] mv88e6xxx_ppu_reenable_work+0x28/0xd4
    [    4.654714]
    [    4.654714] stack backtrace:
    [    4.659098] CPU: 0 PID: 328 Comm: kworker/0:1 Not tainted 4.6.0 #4
    [    4.665286] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
    [    4.671748] Workqueue: events mv88e6xxx_ppu_reenable_work
    [    4.677174] Backtrace:
    [    4.679674] [<8010d354>] (dump_backtrace) from [<8010d5a0>] (show_stack+0x20/0x24)
    [    4.687252]  r6:80fb3c88 r5:80fb3c88 r4:80fb4728 r3:00000002
    [    4.693003] [<8010d580>] (show_stack) from [<803b45e8>] (dump_stack+0x24/0x28)
    [    4.700246] [<803b45c4>] (dump_stack) from [<80157398>] (print_circular_bug+0x208/0x32c)
    [    4.708361] [<80157190>] (print_circular_bug) from [<8015a630>] (__lock_acquire+0x185c/0x1b80)
    [    4.716982]  r10:9ec22a00 r9:00000060 r8:8164b6bc r7:00000040 r6:00000003 r5:8163a5b4
    [    4.724905]  r4:00000003 r3:9ec22de8
    [    4.728537] [<80158dd4>] (__lock_acquire) from [<8015ad5c>] (lock_acquire+0xb4/0x1dc)
    [    4.736378]  r10:60000013 r9:00000000 r8:00000000 r7:00000000 r6:9e5e9c50 r5:80e618e0
    [    4.744301]  r4:00000000
    [    4.746879] [<8015aca8>] (lock_acquire) from [<806d86bc>] (mutex_lock_nested+0x54/0x360)
    [    4.754976]  r10:9e5e9c1c r9:80e616c4 r8:9f685ea0 r7:0000001b r6:9ec22a00 r5:8163a5b4
    [    4.762899]  r4:9e5e9c1c
    [    4.765477] [<806d8668>] (mutex_lock_nested) from [<8049c758>] (mv88e6xxx_reg_read+0x30/0x54)
    [    4.774008]  r10:80e60c5b r9:80e616c4 r8:9f685ea0 r7:0000001b r6:00000004 r5:9e5e9c10
    [    4.781930]  r4:9e5e9c1c
    [    4.784507] [<8049c728>] (mv88e6xxx_reg_read) from [<8049cad8>] (mv88e6xxx_ppu_reenable_work+0x40/0xd4)
    [    4.793907]  r7:9ffd5400 r6:9e5e9c68 r5:9e5e9cb0 r4:9e5e9c10
    [    4.799659] [<8049ca98>] (mv88e6xxx_ppu_reenable_work) from [<8013a794>] (process_one_work+0x1a8/0x604)
    [    4.809059]  r9:80e616c4 r8:9f685ea0 r7:9ffd5400 r6:80e0a1c8 r5:9f5f2e80 r4:9e5e9cb0
    [    4.816910] [<8013a5ec>] (process_one_work) from [<8013ac54>] (worker_thread+0x64/0x528)
    [    4.825010]  r10:9f5f2e80 r9:00000008 r8:80e0dc80 r7:80e0a1fc r6:80e0a1c8 r5:9f5f2e98
    [    4.832933]  r4:80e0a1c8
    [    4.835510] [<8013abf0>] (worker_thread) from [<801409e8>] (kthread+0xec/0x100)
    [    4.842827]  r10:00000000 r9:00000000 r8:00000000 r7:8013abf0 r6:9f5f2e80 r5:9ec15740
    [    4.850749]  r4:00000000
    [    4.853327] [<801408fc>] (kthread) from [<80108f30>] (ret_from_fork+0x14/0x24)
    [    4.860557]  r7:00000000 r6:00000000 r5:801408fc r4:9ec15740

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: slave: chip data is optional, don't dereference NULL
Andrew Lunn [Sat, 4 Jun 2016 19:16:52 +0000 (21:16 +0200)]
net: dsa: slave: chip data is optional, don't dereference NULL

The new binding does not make use of dsa_chip_data, a.k.a cd.  When
retrieving the size of the EEPROM attached to a switch, don't assume
there is a cd attached to the switch structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()
David S. Miller [Sat, 4 Jun 2016 05:56:28 +0000 (22:56 -0700)]
net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: Fix warning in sctp_packet_transmit_chunk()
David S. Miller [Sat, 4 Jun 2016 05:53:26 +0000 (22:53 -0700)]
sctp: Fix warning in sctp_packet_transmit_chunk()

size_t objects should be printed with %Z printf format.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Fix next-ptr chains for BE / 32-bit
Yuval Mintz [Sat, 4 Jun 2016 05:20:16 +0000 (08:20 +0300)]
qed: Fix next-ptr chains for BE / 32-bit

Commit a91eb52abb50 ("qed: Revisit chain implementation") contains an
incorrect implementation for BE platforms, as device's regpairs containing
addresses are LE and they're not converted correctly when read back.
In addition, it raises a compilation warning for 32-bit platforms where
dma_addr_t is a 32-bit variable.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'qed-roce-iscsi'
David S. Miller [Sat, 4 Jun 2016 00:08:45 +0000 (20:08 -0400)]
Merge branch 'qed-roce-iscsi'

Yuval Mintz says:

====================
qed: RocE & iSCSI infrastructure

We plan on sending 2 new protocol drivers in the imminent future -
both our RoCE [qedr] and iSCSI [qedi] drivers. As both submissions
would be rather massive and in order to avoid collisions between them,
the common infrastructure on the qed side was prepared as an independent
patch-series to be sent ahead of those 2 submissions.

This patch series introduces in QED 2 new 'ids' - one for iscsi and
one for roce. It then goes and adds logic required for configuring
said protocols in HW. Notice it *doesn't* actually add any client using
said ids, but rather only the infrastructure to allow their later usage.

What this patch doesn't contain is the slowpath protocol-configuration
toward the firmware. I.e., it contains register-setting logic, memory
allocations, etc., but not actual flow-related configuration specific
to the protocl. Those would be sent as part of the protocol driver
submissions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Initialize hardware for new protocols
Yuval Mintz [Fri, 3 Jun 2016 11:35:35 +0000 (14:35 +0300)]
qed: Initialize hardware for new protocols

RoCE and iSCSI would require some added/changed hw configuration in order
to properly run; The biggest single change being the requirement of
allocating and mapping host memory for several HW blocks that aren't being
used by qede [SRC, QM, TM, etc.].

In addition, whereas qede is only using context memory for HW blocks, the
new protocol would also require task memories to be added.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Add iscsi/rdma personalities
Yuval Mintz [Fri, 3 Jun 2016 11:35:34 +0000 (14:35 +0300)]
qed: Add iscsi/rdma personalities

This patch adds in the ecore 2 new personalities in addition to
QED_PCI_ETH - QED_PCI_ISCSI and QED_PCI_ETH_ROCE.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Add common HSI for new protocols
Yuval Mintz [Fri, 3 Jun 2016 11:35:33 +0000 (14:35 +0300)]
qed: Add common HSI for new protocols

This adds the qed portion of the RoCE & iSCSI firmware HSI,
as well as adding several new common HSI files which would be required
by both qed and qed* protocols.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Revisit chain implementation
Yuval Mintz [Fri, 3 Jun 2016 11:35:32 +0000 (14:35 +0300)]
qed: Revisit chain implementation

RoCE driver is going to need a 32-bit chain [current chain implementation
for qed* currently supports only 16-bit producer/consumer chains].

This patch adds said support, as well as doing other slight tweaks and
modifications to qed's chain API.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: ti: cpsw: remove unused priv lock
Ivan Khoronzhuk [Thu, 2 Jun 2016 22:37:08 +0000 (01:37 +0300)]
net: ethernet: ti: cpsw: remove unused priv lock

There is no reason in this lock. At least for now.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB
Joe Perches [Thu, 2 Jun 2016 19:08:52 +0000 (12:08 -0700)]
rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB

Use the more common kernel logging style and reduce object size.

The logging message prefix changes from a mixture of
"RxRPC:" and "RXRPC:" to "af_rxrpc: ".

$ size net/rxrpc/built-in.o*
   text    data     bss     dec     hex filename
  64172    1972    8304   74448   122d0 net/rxrpc/built-in.o.new
  67512    1972    8304   77788   12fdc net/rxrpc/built-in.o.old

Miscellanea:

o Consolidate the ASSERT macros to use a single pr_err call with
  decimal and hexadecimal output and a stringified #OP argument

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Fix VF register on vlan devices
Haiyang Zhang [Thu, 2 Jun 2016 19:02:04 +0000 (12:02 -0700)]
hv_netvsc: Fix VF register on vlan devices

Added a condition to avoid vlan devices with same MAC registering
as VF.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'sctp-gso'
David S. Miller [Fri, 3 Jun 2016 23:37:26 +0000 (19:37 -0400)]
Merge branch 'sctp-gso'

Marcelo Ricardo Leitner says:

====================
sctp: Add GSO support

This patchset adds sctp GSO support.

Performance tests indicates that increases throughput by 10% if using
bigger chunk sizes, specially if bigger than MTU. For small chunks, it
doesn't help much if not using heavy firewall rules.

For small chunks it will probably be of more use once we get something
like MSG_MORE as David Laight had suggested.

overall changes:
v1->v2:
Added support for receiving GSO frames on SCTP stack, as requested by
Dave Miller.

v2->v3:
Consider sctphdr size in skb_gso_transport_seglen()
rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
unsupported GSO")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: improve debug message to also log curr pkt and new chunk size
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:44 +0000 (15:05 -0300)]
sctp: improve debug message to also log curr pkt and new chunk size

This is useful for debugging packet sizes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: Add GSO support
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:43 +0000 (15:05 -0300)]
sctp: Add GSO support

SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
  single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
  unsupported GSO")

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: delay as much as possible skb_linearize
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:42 +0000 (15:05 -0300)]
sctp: delay as much as possible skb_linearize

This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.

This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.

One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.

With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoskbuff: introduce skb_gso_validate_mtu
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:41 +0000 (15:05 -0300)]
skbuff: introduce skb_gso_validate_mtu

skb_gso_network_seglen is not enough for checking fragment sizes if
skb is using GSO_BY_FRAGS as we have to check frag per frag.

This patch introduces skb_gso_validate_mtu, based on the former, which
will wrap the use case inside it as all calls to skb_gso_network_seglen
were to validate if it fits on a given TMU, and improve the check.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosk_buff: allow segmenting based on frag sizes
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:40 +0000 (15:05 -0300)]
sk_buff: allow segmenting based on frag sizes

This patch allows segmenting a skb based on its frags sizes instead of
based on a fixed value.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoskbuff: export skb_gro_receive
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:39 +0000 (15:05 -0300)]
skbuff: export skb_gro_receive

sctp GSO requires it and sctp can be compiled as a module, so we need to
export this function.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoloopback: make use of NETIF_F_GSO_SOFTWARE
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:38 +0000 (15:05 -0300)]
loopback: make use of NETIF_F_GSO_SOFTWARE

NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets
make use of it in loopback code. Note that veth/vxlan/others already
uses it.

Within this patch series, this patch causes lo to pick up SCTP GSO feature
automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding
segmentation if possible.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fjes: fjes_main: Remove create_workqueue
Bhaktipriya Shridhar [Thu, 2 Jun 2016 09:30:57 +0000 (15:00 +0530)]
net: fjes: fjes_main: Remove create_workqueue

alloc_workqueue replaces deprecated create_workqueue().

The workqueue adapter->txrx_wq has workitem
&adapter->raise_intr_rxdata_task per adapter. Extended Socket Network
Device is shared memory based, so someone's transmission denotes other's
reception.  raise_intr_rxdata_task raises interruption of receivers from
the sender in order to notify receivers.

The workqueue adapter->control_wq has workitem
&adapter->interrupt_watch_task per adapter. interrupt_watch_task is used
to prevent delay of interrupts.

Dedicated workqueues have been used in both cases since the workitems
on the workqueues are involved in normal device operation and require
forward progress under memory pressure.

max_active has been set to 0 since there is no need for throttling
the number of active work items.

Since network devices  may be used for memory reclaim,
WQ_MEM_RECLAIM has been set to guarantee forward progress.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Utilize FW 8.10.3.0
Yuval Mintz [Thu, 2 Jun 2016 07:23:29 +0000 (10:23 +0300)]
qed: Utilize FW 8.10.3.0

The New QED firmware contains several fixes, including:
  - Wrong classification of packets in 4-port devices.
  - Anti-spoof interoperability with encapsulated packets.
  - Tx-switching of encapsulated packets.
It also slightly improves Tx performance of the device.

In addition, this firmware contains the necessary logic for
supporting iscsi & rdma, for which we plan on pushing protocol
drivers in the imminent future.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: vrf: set operstate and mtu at link create
David Ahern [Thu, 2 Jun 2016 04:16:39 +0000 (21:16 -0700)]
net: vrf: set operstate and mtu at link create

The VRF device exists to define L3 domains and guide FIB lookups. As
such its operstate is not relevant. Seeing 'state UNKNOWN' in the
output of 'ip link show' can be confusing, so set operstate at link
create.

Similarly, the MTU for a VRF device is not used; any fragmentation
of the payload is done on the output path based on the real egress
device. An MTU of 1500 on the VRF device while enslaved devices
have a higher MTU can lead to confusion. Since the VRF MTU is not
relevant set to 64k similar to what is done for loopback.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoovs: set name assign type of internal port
Zhang Shengju [Tue, 31 May 2016 13:41:02 +0000 (13:41 +0000)]
ovs: set name assign type of internal port

Set name_assign_type of internal port to NET_NAME_USER.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: wiznet: Remove create_workqueue
Bhaktipriya Shridhar [Wed, 1 Jun 2016 17:59:15 +0000 (23:29 +0530)]
net: ethernet: wiznet: Remove create_workqueue

alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workitems are involved
in normal device operation. Workitems &priv->rx_work and &priv->tx_work,
map to w5100_rx_work and w5100_tx_work respectively and are involved in
receiving and transmitting packets. Forward progress under
memory pressure is a requirement here.

create_workqueue has been replaced with alloc_workqueue with max_active
as 0 since there is no need for throttling the number of active work
items.

Since the driver may be used in memory reclaim path,
WQ_MEM_RECLAIM has been set to guarantee forward progress.

flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoravb: Remove manual pause frame transmit
Masaru Nagai [Wed, 1 Jun 2016 16:41:05 +0000 (01:41 +0900)]
ravb: Remove manual pause frame transmit

Writing a non-zero value to the manual PAUSE frame register (MPR) starts
the transmission of a PAUSE frame.
A PAUSE frame is sent in ravb_emac_init(), but it is not expected behavior.

Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: make platform drivers depend on their associated SoC
Peter Robinson [Wed, 1 Jun 2016 12:28:58 +0000 (13:28 +0100)]
stmmac: make platform drivers depend on their associated SoC

There's not much point, except compile test, enabling the stmmac
platform drivers unless their actual SoC is enabled. They're not
useful without it.

Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'macvlan-less-mcast-cloning'
David S. Miller [Thu, 2 Jun 2016 00:48:46 +0000 (17:48 -0700)]
Merge branch 'macvlan-less-mcast-cloning'

Herbert Xu says:

====================
macvlan: Avoid unnecessary multicast cloning

This patch tries to improve macvlan multicast performance by
maintaining a filter hash at the macvlan_port level so that we
can quickly determine whether a given packet is needed or not.

It is preceded by a patch that fixes a potential use-after-free
bug that I discovered while looking over this.

v2 fixed a bug where promiscuous/allmulti settings weren't handled
correctly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacvlan: Avoid unnecessary multicast cloning
Herbert Xu [Wed, 1 Jun 2016 03:45:44 +0000 (11:45 +0800)]
macvlan: Avoid unnecessary multicast cloning

Currently we always queue a multicast packet for further processing,
even if none of the macvlan devices are subscribed to the address.

This patch optimises this by adding a global multicast filter for
a macvlan_port.

Note that this patch doesn't handle the broadcast addresses of the
individual macvlan devices correctly, if they are not all identical
to vlan->lowerdev.  However, this is already broken because there
is no mechanism in place to update the individual multicast filters
when you change the broadcast address.

If someone cares enough they should fix this by collecting all
broadcast addresses for a macvlan as we do for multicast and unicast.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacvlan: Fix potential use-after free for broadcasts
Herbert Xu [Wed, 1 Jun 2016 03:43:00 +0000 (11:43 +0800)]
macvlan: Fix potential use-after free for broadcasts

When we postpone a broadcast packet we save the source port in
the skb if it is local.  However, the source port can disappear
before we get a chance to process the packet.

This patch fixes this by holding a ref count on the netdev.

It also delays the skb->cb modification until after we allocate
the new skb as you should not modify shared skbs.

Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: avoid csum_partial() for validated skb
Eric Dumazet [Tue, 31 May 2016 22:22:41 +0000 (15:22 -0700)]
udp: avoid csum_partial() for validated skb

In commit e6afc8ace6dd5 ("udp: remove headers from UDP packets before
queueing"), udp_csum_pull_header() helper was added but missed fact
that CHECKSUM_UNNECESSARY packets were now converted to CHECKSUM_NONE
and skb->csum_valid was set to 1 for them.

Since csum_partial() is quite expensive, even for 8-byte area, it is
worth adding a test.

We also can use skb->data instead of udp_hdr() as we are pulling
UDP headers, as it is sightly faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>