David S. Miller [Wed, 11 Jul 2012 05:53:57 +0000 (22:53 -0700)]
Merge branch 'metrics_restructure'
This patch series works towards the goal of minimizing the amount
of things that can change in an ipv4 route.
In a regime where the routing cache is removed, route changes will
lead to cloning in the FIB tables or similar.
The largest trigger of route metrics writes, TCP, now has it's own
cache of dynamic metric state. The timewait timestamps are stored
there now as well.
As a result of that, pre-cowing metrics is no longer necessary,
and therefore FLOWI_FLAG_PRECOW_METRICS is removed.
Redirect and PMTU handling is moved back into the ipv4 routes. I'm
sorry for all the headaches trying to do this in the inetpeer has
caused, it was the wrong approach for sure.
Since metrics become read-only for ipv4 we no longer need the inetpeer
hung off of the ipv4 routes either. So those disappear too.
Also, timewait sockets no longer need to hold onto an inetpeer either.
After this series, we still have some details to resolve wrt. PMTU and
redirects for a route-cache-less system:
1) With just the plain route cache removal, PMTU will continue to
work mostly fine. This is because of how the local route users
call down into the PMTU update code with the route they already
hold.
However, if we wish to cache pre-computed routes in fib_info
nexthops (which we want for performance), then we need to add
route cloning for PMTU events.
2) Redirects require more work. First, redirects must be changed to
be handled like PMTU. Wherein we call down into the sockets and
other entities, and then they call back into the routing code with
the route they were using.
So we'll be adding an ->update_nexthop() method alongside
->update_pmtu().
And then, like for PMTU, we'll need cloning support once we start
caching routes in the fib_info nexthops.
But that's it, we can completely pull the trigger and remove the
routing cache with minimal disruptions.
As it is, this patch series alone helps a lot of things. For one,
routing cache entry creation should be a lot faster, because we no
longer do inetpeer lookups (even to check if an entry exists).
This patch series also opens the door for non-DST_HOST ipv4 routes,
because nothing fundamentally cares about rt->rt_dst any more. It
can be removed with the base routing cache removal patch. In fact,
that was the primary goal of this patch series.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:26:01 +0000 (07:26 -0700)]
ipv4: Remove inetpeer from routes.
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:08:18 +0000 (07:08 -0700)]
ipv4: Calling ->cow_metrics() now is a bug.
Nothing every writes to ipv4 metrics any longer.
PMTU is stored in rt->rt_pmtu.
Dynamic TCP metrics are stored in a special TCP metrics cache,
completely outside of the routes.
Therefore ->cow_metrics() can simply nothing more than a WARN_ON
trigger so we can catch anyone who tries to add new writes to
ipv4 route metrics.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:03:43 +0000 (07:03 -0700)]
ipv4: Kill dst_copy_metrics() call from ipv4_blackhole_route().
Blackhole routes have a COW metrics operation that returns NULL
always, therefore this dst_copy_metrics() call did absolutely
nothing.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:02:09 +0000 (07:02 -0700)]
ipv4: Enforce max MTU metric at route insertion time.
Rather than at every struct rtable creation.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 13:58:42 +0000 (06:58 -0700)]
ipv4: Maintain redirect and PMTU info in struct rtable again.
Maintaining this in the inetpeer entries was not the right way to do
this at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 12:06:14 +0000 (05:06 -0700)]
rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo().
Nobody provides non-zero values any longer.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 11:01:57 +0000 (04:01 -0700)]
inet: Kill FLOWI_FLAG_PRECOW_METRICS.
No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:58:16 +0000 (03:58 -0700)]
inet: Minimize use of cached route inetpeer.
Only use it in the absolutely required cases:
1) COW'ing metrics
2) ipv4 PMTU
3) ipv4 redirects
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:32:59 +0000 (03:32 -0700)]
inet: Remove ->get_peer() method.
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:27:56 +0000 (03:27 -0700)]
tcp: Remove tw->tw_peer
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:14:24 +0000 (03:14 -0700)]
tcp: Move timestamps from inetpeer to metrics cache.
With help from Lin Ming.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:53:48 +0000 (00:53 -0700)]
net: Kill set_dst_metric_rtt().
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:52:56 +0000 (00:52 -0700)]
net: Don't report route RTT metric value in cache dumps.
We don't maintain it dynamically any longer, so reporting it would
be extremely misleading. Report zero instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:49:14 +0000 (00:49 -0700)]
tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.
Computed TCP metrics are no longer maintained in the route metrics.
The table uses RCU and an extremely simple hash so that it has low
latency and low overhead. A simple hash is legitimate because we only
make metrics blobs for fully established connections.
Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking. But
the basic design seems sound.
With help from Eric Dumazet and Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jul 2012 23:19:30 +0000 (16:19 -0700)]
tcp: Abstract back handling peer aliveness test into helper function.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jul 2012 23:07:30 +0000 (16:07 -0700)]
tcp: Move dynamnic metrics handling into seperate file.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 10 Jul 2012 06:18:44 +0000 (06:18 +0000)]
etherdevice: introduce eth_broadcast_addr
A lot of code has either the memset or an inefficient copy
from a static array that contains the all-ones broadcast
address. Introduce eth_broadcast_addr() to fill an address
with all ones, making the code clearer and allowing us to
get rid of some constant arrays.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 01:05:28 +0000 (18:05 -0700)]
ipv4: Fix crashes in fib_rules_tclass().
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.
We violated that expectation in the new fib_lookup() fast path.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jul 2012 23:09:47 +0000 (16:09 -0700)]
Merge branch 'davem-next.r8169' of git://violet.fr.zoreil.com/romieu/linux
Francois Romieu (4):
r8169: mdio_ops signature change.
r8169: csi_ops signature change.
r8169: ephy, eri and efuse functions signature changes.
r8169: abstract out loop conditions.
Hayes Wang (2):
r8169: add RTL8106E support.
r8169: support RTL8168G
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 9 Jul 2012 21:57:36 +0000 (16:57 -0500)]
net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
Macro spin_event_timeout() was designed for simple polling of hardware
registers with a timeout, so use it when we poll the MIIMIND register.
This allows us to return an error code instead of polling indefinitely.
Note that PHY_INIT_TIMEOUT is a count of loop iterations, so we can't use
it for spin_event_timeout(), which asks for microseconds.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emeric Vigier [Mon, 9 Jul 2012 21:44:45 +0000 (17:44 -0400)]
smsc95xx: support ethtool get_regs
Inspired by implementation in smsc911x.c and smsc9420.c
Tested on ARM/pandaboard running android
Signed-off-by: Emeric Vigier <emeric.vigier@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devendra Naga [Sun, 8 Jul 2012 05:57:57 +0000 (05:57 +0000)]
r6040: use module_pci_driver macro
as the manual of module_pci_driver says that
it can be used when the init and exit functions of
the module does nothing but the pci_register_driver
and pci_unregister_driver.
use it for rdc's r6040 driver, as the init and exit
paths does as above, and also this reduces a little
amount of code.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 9 Jul 2012 06:02:24 +0000 (06:02 +0000)]
bnx2x: populate skb->l4_rxhash
l4_rxhash is set on skb when rxhash is obtained from canonical 4-tuple
over transport ports/addresses.
We can set skb->l4_rxhash for all incoming TCP packets on bnx2x for
free, as cqe status contains a hash type information.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Mon, 2 Jul 2012 09:23:22 +0000 (17:23 +0800)]
r8169: support RTL8168G
For RTL8111G, the settings of phy and firmware are replaced with
ocp functions. r8168g_mdio_{write / read} redirects the relative
settings to suitable ocp functions. A per-device variable is needed
to evaluate the real address of ocp functions.
rtl_writephy(tp, 0x1f, xxxx) is dedicated to keeping said variable
up-to-date.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 12:19:23 +0000 (14:19 +0200)]
r8169: abstract out loop conditions.
Twelve functions can fail silently. Now they have a chance to complain.
Macro and pasting abuse has been kept at a level where tags and
friends should not be hurt.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 20:40:38 +0000 (22:40 +0200)]
r8169: ephy, eri and efuse functions signature changes.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 11:37:00 +0000 (13:37 +0200)]
r8169: csi_ops signature change.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 18:19:42 +0000 (20:19 +0200)]
r8169: mdio_ops signature change.
Further changes need more context down in the call stack.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Hayes Wang [Mon, 2 Jul 2012 09:23:21 +0000 (17:23 +0800)]
r8169: add RTL8106E support.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Dan Carpenter [Wed, 4 Jul 2012 22:27:18 +0000 (22:27 +0000)]
small cleanup in ax25_addr_parse()
The comments were wrong here because "AX25_MAX_DIGIS" is 8 but the
comments say 6. Also I've changed the "7" to "AX25_ADDR_LEN".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Wed, 4 Jul 2012 16:05:42 +0000 (16:05 +0000)]
be2net: Fix Endian
ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, we should change ETH_P_IP from host endian
to big endian, htons, not ntohs.
CC: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Wed, 4 Jul 2012 05:05:45 +0000 (05:05 +0000)]
bcm87xx: fix reg-init comment typo
broadcom, not marvell.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Hohnstaedt [Wed, 4 Jul 2012 05:44:34 +0000 (05:44 +0000)]
phylib: Support registering a bunch of drivers
If registering of one of them fails, all already registered drivers
of this module will be unregistered.
Use the new register/unregister functions in all drivers
registering more than one driver.
amd.c, realtek.c: Simplify: directly return registration result.
Tested with broadcom.c
All others compile-tested.
Signed-off-by: Christian Hohnstaedt <chohnstaedt@innominate.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Wed, 4 Jul 2012 00:33:57 +0000 (00:33 +0000)]
bcm87xx: disable autonegotiation by default
The bcm87xx phys don't support autonegotiation, so don't use it by
default, as otherwise phy_state_machine() will try to enable it (using
c22 requests, which also don't make any sense for the bcm78xx).
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mirko Lindner [Tue, 3 Jul 2012 23:38:46 +0000 (23:38 +0000)]
sky2: Fix for interrupt handler
Re-enable interrupts if it is not our interrupt
Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mirko Lindner [Tue, 3 Jul 2012 23:38:41 +0000 (23:38 +0000)]
sky2: Added support for Optima EEE
This patch adds support for the Optima EEE chipset.
Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Ferre [Tue, 3 Jul 2012 23:14:13 +0000 (23:14 +0000)]
net/macb: manage carrier state with call to netif_carrier_{on|off}()
OFF carrier state is setup in probe() open() and suspend() functions.
The carrier ON state is managed in macb_handle_link_change().
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 2 Jul 2012 09:59:24 +0000 (09:59 +0000)]
sctp: refactor sctp_packet_append_chunk and clenup some memory leaks
While doing some recent work on sctp sack bundling I noted that
sctp_packet_append_chunk was pretty inefficient. Specifially, it was called
recursively while trying to bundle auth and sack chunks. Because of that we
call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4 times for
every call to sctp_packet_append_chunk, knowing that at least 3 of those calls
will do nothing.
So lets refactor sctp_packet_bundle_auth to have an outer part that does the
attempted bundling, and an inner part that just does the chunk appends. This
saves us several calls per iteration that we just don't need.
Also, noticed that the auth and sack bundling fail to free the chunks they
allocate if the append fails, so make sure we add that in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Sat, 30 Jun 2012 01:49:35 +0000 (01:49 +0000)]
bnx2i: use strlcpy() instead of memcpy() for strings
DRV_MODULE_VERSION here is "2.7.2.2" which is only 8 chars but we copy
12 bytes from the stack so it's a small information leak.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 7 Jul 2012 23:29:29 +0000 (16:29 -0700)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Eric Dumazet [Thu, 5 Jul 2012 04:31:01 +0000 (04:31 +0000)]
asix: avoid copies in tx path
I noticed excess calls to skb_copy_expand() or memmove() in asix driver.
This driver needs to push 4 bytes in front of frame (packet_len)
and maybe add 4 bytes after the end (if padlen is 4)
So it should set needed_headroom & needed_tailroom to avoid
copies. But its not enough, because many packets are cloned
before entering asix_tx_fixup() and this driver use skb_cloned()
as a lazy way to check if it can push and put additional bytes in frame.
Avoid skb_copy_expand() expensive call, using following rules :
- We are allowed to push 4 bytes in headroom if skb_header_cloned()
is false (and if we have 4 bytes of headroom)
- We are allowed to put 4 bytes at tail if skb_cloned()
is false (and if we have 4 bytes of tailroom)
TCP packets for example are cloned, but skb_header_release()
was called in tcp stack, allowing us to use headroom for our needs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Allan Chou <allan@asix.com.tw>
Cc: Trond Wuellner <trond@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Paul Stewart <pstew@chromium.org>
Cc: Ming Lei <tom.leiming@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:50 +0000 (04:03 +0000)]
net/mlx4_en: Add support for drop action through ethtool
The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.
Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:49 +0000 (04:03 +0000)]
net/mlx4_en: Manage flow steering rules with ethtool
Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:48 +0000 (04:03 +0000)]
net/mlx4: Implement promiscuous mode with device managed flow-steering
The device managed flow steering API has three promiscuous modes:
1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.
Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:47 +0000 (04:03 +0000)]
net/mlx4_core: Add resource tracking for device managed flow steering rules
As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:46 +0000 (04:03 +0000)]
{NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.
If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.
When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.
When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.
Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:45 +0000 (04:03 +0000)]
net/mlx4_core: Add firmware commands to support device managed flow steering
Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:44 +0000 (04:03 +0000)]
net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.
This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.
A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.
B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,
The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Thu, 5 Jul 2012 04:03:43 +0000 (04:03 +0000)]
net/mlx4_en: Re-design multicast attachments flow
Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.
To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.
Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:42 +0000 (04:03 +0000)]
net/mlx4_core: Change resource tracking ID to be 64 bit
Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.
As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:41 +0000 (04:03 +0000)]
net/mlx4_core: Change resource tracking mechanism to use red-black tree
Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 7 Jul 2012 23:18:50 +0000 (16:18 -0700)]
Merge branch 'master' of git://1984.lsi.us.es/nf-next
Devendra Naga [Fri, 6 Jul 2012 20:07:35 +0000 (20:07 +0000)]
r6040: remove duplicate call to the pci_set_drvdata
pci_set_drvdata is called twice at the remove path of driver,
call it once.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 6 Jul 2012 07:19:05 +0000 (09:19 +0200)]
ipv6: fix a bad cast in ip6_dst_lookup_tail()
Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Jul 2012 20:34:35 +0000 (20:34 +0000)]
ipv6: remove redundant declarations
remove redundant declarations, they belong in include/net/tcp.h
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Jul 2012 05:13:13 +0000 (22:13 -0700)]
ipv4: Avoid overhead when no custom FIB rules are installed.
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.
It's just pure overhead.
Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.
Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Jul 2012 04:08:05 +0000 (21:08 -0700)]
ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock.
Otherwise local_bh_enable() complains.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Thu, 5 Jul 2012 03:18:28 +0000 (03:18 +0000)]
ipv6: Initialize the neighbour pointer of rt6_info on allocation
git commit
97cac082 (ipv6: Store route neighbour in rt6_info struct)
added a neighbour pointer to rt6_info. Currently we don't initialize
this pointer at allocation time. We assume this pointer to be valid
if it is not a null pointer, so initialize it on allocation.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Jul 2012 10:44:25 +0000 (03:44 -0700)]
Merge git://git./linux/kernel/git/davem/net
alex.bluesman.smirnov@gmail.com [Sun, 1 Jul 2012 20:18:32 +0000 (20:18 +0000)]
drivers/ieee802154/at231rf230: remove unused return status
Remove excessive variable used for the return status.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alex.bluesman.smirnov@gmail.com [Sun, 1 Jul 2012 19:58:46 +0000 (19:58 +0000)]
6lowpan: revert 'reuse eth_mac_addr()'
This reverts the commit
cdf49c283e2e105da86ca575ad35b453f5ff24ea which
replaces lowpan '.ndo_set_mac_address' method by ethernet's one.
Accorind to the IEEE 802.15.4 standard, device has 8-byte length address,
so this hook loses the last 2 bytes which may rise a compatibility problems
with other IEEE 802.15.4 standard implementations.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Sun, 1 Jul 2012 17:19:00 +0000 (17:19 +0000)]
dccp: remove unnecessary codes in ipv6.c
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Sun, 1 Jul 2012 17:18:59 +0000 (17:18 +0000)]
ipv6: remove unnecessary codes in tcp_ipv6.c
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:19:00 +0000 (03:19 +0000)]
be2net: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:59 +0000 (03:18 +0000)]
bnx2x: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:58 +0000 (03:18 +0000)]
bnx2: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:57 +0000 (03:18 +0000)]
tg3: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:56 +0000 (03:18 +0000)]
myri10ge: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Jon Mason <mason@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:55 +0000 (03:18 +0000)]
cxgb4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:54 +0000 (03:18 +0000)]
cxgb3: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:53 +0000 (03:18 +0000)]
qlge: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Cc: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Cc: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:52 +0000 (03:18 +0000)]
vxge: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:51 +0000 (03:18 +0000)]
mlx4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 1 Jul 2012 03:18:50 +0000 (03:18 +0000)]
net-next: Add netif_get_num_default_rss_queues
Most multi-queue networking driver consider the number of online cpus when
configuring RSS queues.
This patch adds a wrapper to the number of cpus, setting an upper limit on the
number of cpus a driver should consider (by default) when allocating resources
for his queues.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Jul 2012 22:30:09 +0000 (22:30 +0000)]
ipv4: defer fib_compute_spec_dst() call
ip_options_compile() can avoid calling fib_compute_spec_dst()
by default, and perform the call only if needed.
David suggested to add a helper to make the call only once.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:58:02 +0000 (22:58 -0700)]
net: Kill dst->_neighbour, accessors, and final uses.
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:53:37 +0000 (22:53 -0700)]
xfrm: No need to copy generic neighbour pointer.
Nobody reads it any longer.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 08:07:44 +0000 (01:07 -0700)]
ipv4: No need to set generic neighbour pointer.
Nobody reads it any longer.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:43:47 +0000 (22:43 -0700)]
ipv6: Store route neighbour in rt6_info struct.
This makes for a simplified conversion away from dst_get_neighbour*().
All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:35:31 +0000 (22:35 -0700)]
cxgb3: Convert t3_l2t_get() over to dst_neigh_lookup().
This means passing in a suitable destination address.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 08:01:51 +0000 (01:01 -0700)]
net: Pass neighbours and dest address into NETEVENT_REDIRECT events.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:22:18 +0000 (22:22 -0700)]
decnet: Use neighbours privately in dn_route struct.
This allows an easy conversion away from dst_get_neighbour*().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:15:37 +0000 (22:15 -0700)]
neigh: Convert over to dst_neigh_lookup_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:12:59 +0000 (22:12 -0700)]
br_netfilter: Convert to dst_neigh_lookup_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:10:55 +0000 (22:10 -0700)]
cxgb4i: Convert over to dst_neigh_lookup().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:08:58 +0000 (22:08 -0700)]
cxgbi: Convert over to dst_neigh_lookup().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:02:33 +0000 (22:02 -0700)]
qeth: Convert over to dst_neigh_lookup_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 05:00:03 +0000 (22:00 -0700)]
ipoib: Convert over to dev_lookup_neigh_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 04:57:45 +0000 (21:57 -0700)]
sch_teql: Convert over to dev_neigh_lookup_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jul 2012 04:52:24 +0000 (21:52 -0700)]
net: Add optional SKB arg to dst_ops->neigh_lookup().
Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jul 2012 09:21:03 +0000 (02:21 -0700)]
net: Do delayed neigh confirmation.
When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.
This removes the dependency in dst_confirm() of dst's having an
attached neigh.
While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jul 2012 09:15:37 +0000 (02:15 -0700)]
sunrpc: Don't do a dst_confirm() on an input routes.
xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.
This isn't right, dst confirmation is for output routes, not input
rights. It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jul 2012 09:04:13 +0000 (02:04 -0700)]
ipv4: Don't report neigh uptodate state in rtcache procfs.
Soon routes will not have a cached neigh attached, nor will we
be able to necessarily go directly to a neigh from an arbitrary
route.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jul 2012 09:02:15 +0000 (02:02 -0700)]
ipv4: Make neigh lookups directly in output packet path.
Do not use the dst cached neigh, we'll be getting rid of that.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Jul 2012 23:13:17 +0000 (16:13 -0700)]
ipv4: Fix crashes in ip_options_compile().
The spec_dst uses should be guarded by skb_rtable() being non-NULL
not just the SKB being non-null.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krishna Kumar [Wed, 27 Jun 2012 00:59:56 +0000 (00:59 +0000)]
netfilter: nfnetlink_queue: do not allow to set unsupported flag bits
Allow setting of only supported flag bits in queue->flags.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tomasz Bursztyka [Thu, 28 Jun 2012 02:57:47 +0000 (02:57 +0000)]
netfilter: nfnetlink: check callbacks before using those in nfnetlink_rcv_msg
nfnetlink_rcv_msg() might call a NULL callback which will cause NULL pointer
dereference.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 29 Jun 2012 05:23:25 +0000 (05:23 +0000)]
netfilter: nf_ct_tcp: missing per-net support for cttimeout
This patch adds missing per-net support for the cttimeout
infrastructure to TCP.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Pablo Neira Ayuso [Fri, 29 Jun 2012 05:23:24 +0000 (05:23 +0000)]
netfilter: nf_conntrack: generalize nf_ct_l4proto_net
This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.
To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>