GitHub/moto-9609/android_kernel_motorola_exynos9610.git
15 years agoRevert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
Damian Lukowski [Wed, 26 Aug 2009 00:16:34 +0000 (00:16 +0000)]
Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.

RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.

For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.

However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).

In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.

This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.

Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.

The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoRevert Backoff [v3]: Revert RTO on ICMP destination unreachable
Damian Lukowski [Wed, 26 Aug 2009 00:16:31 +0000 (00:16 +0000)]
Revert Backoff [v3]: Revert RTO on ICMP destination unreachable

Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.

This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.

Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoRevert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()
Damian Lukowski [Wed, 26 Aug 2009 00:16:27 +0000 (00:16 +0000)]
Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()

This supplementary patch renames skb to icmp_skb in tcp_v4_err() in order to
disambiguate from another sk_buff variable, which will be introduced
in a separate patch.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoixgbe: Add support for dcbnl_rtnl_ops.setapp/getapp
Yi Zou [Mon, 31 Aug 2009 12:34:28 +0000 (12:34 +0000)]
ixgbe: Add support for dcbnl_rtnl_ops.setapp/getapp

Add support for dcbnl_rtnl_ops.setapp/getapp to set or get the current user
priority bitmap for the given application protocol. Currently, 82599 only
supports setapp/getapp for Fiber Channel over Ethernet (FCoE) protocol.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agodcbnl: Add implementations of dcbnl setapp/getapp commands
Yi Zou [Mon, 31 Aug 2009 12:33:40 +0000 (12:33 +0000)]
dcbnl: Add implementations of dcbnl setapp/getapp commands

Implements the dcbnl netlink setapp/getapp pair. When a setapp/getapp
is received, dcbnl would just pass on to dcbnl_rtnl_op.setapp/getapp
that are supposed to be implemented by the low level drivers.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agodcbnl: Add netlink attributes for setapp/getapp to dcbnl
Yi Zou [Mon, 31 Aug 2009 12:33:20 +0000 (12:33 +0000)]
dcbnl: Add netlink attributes for setapp/getapp to dcbnl

Add defines for dcbnl netlink attributes to support netlink message passing of
setapp/getapp in dcbnl.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agodcbnl: Add support for setapp/getapp to netdev dcbnl_rtnl_ops
Yi Zou [Mon, 31 Aug 2009 12:32:55 +0000 (12:32 +0000)]
dcbnl: Add support for setapp/getapp to netdev dcbnl_rtnl_ops

Adds support of dcbnl setapp/getapp to dcbnl_rtnl_ops in netdev to allow
LLDs to implement their corresponding dcbnl setapp/getapp ops to support
the IEEE 802.1Q DCBX setapp/getapp commands.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agodcbnl: Add support for setapp/getapp commands to dcbnl
Yi Zou [Mon, 31 Aug 2009 12:32:34 +0000 (12:32 +0000)]
dcbnl: Add support for setapp/getapp commands to dcbnl

This patch adds dcbnl command definitions to support setapp/getapp
functionality from the IEEE 802.1Qaz Data Center Bridging Capability
Exchange protocol (DCBX) specification. Section 3.3 defines the
application protocol and its 802.1p user priority in DCBX, which is
implemented here as a pair of setapp/getapp commands in the kernel
dcbnl for setting and retrieving the user priority for an given
application protocol. The protocol is identified by the combination of
an id and an idtype. Currently, when idtype is 0, the corresponding
id gives the ether type of this protocol, e.g., for FCoE, it will be
0x8906; when idtype is 1, then the corresponding id gives the TCP or
UDP port number.

For more information regarding DCBX spec., please refer to the following:
http://www.ieee802.org/1/files/public/docs2008/
az-wadekar-dcbx-capability-exchange-discovery-protocol-1108-v1.01.pdf

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoixgbe: Add support for the net_device_ops.ndo_fcoe_enable/disable to 82599
Yi Zou [Mon, 31 Aug 2009 12:32:14 +0000 (12:32 +0000)]
ixgbe: Add support for the net_device_ops.ndo_fcoe_enable/disable to 82599

This adds support to the net_device_ops.ndo_fcoe_enable/disable for 82599. This
consequently allows us to dynamically turn FCoE offload feature on or off
upon incoming calls to ndo_fcoe_enable/disable. When this happens, FCoE offload
features are enabled/disabled accordingly, and this is regardless of whether
DCB being turned on or not.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agovlan: Add support for net_devices_ops.ndo_fcoe_enable/_disable to VLAN
Yi Zou [Mon, 31 Aug 2009 12:31:55 +0000 (12:31 +0000)]
vlan: Add support for net_devices_ops.ndo_fcoe_enable/_disable to VLAN

This adds implementation of the net_devices_ops.ndo_fcoe_enable/_disable to
the VLAN driver. It checks if the real_dev has support for ndo_fcoe_enable/
ndo_fcoe_disable and if so, passes on to call the associated real_dev.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: Add ndo_fcoe_enable/ndo_fcoe_disable to net_device_ops
Yi Zou [Mon, 31 Aug 2009 12:31:36 +0000 (12:31 +0000)]
net: Add ndo_fcoe_enable/ndo_fcoe_disable to net_device_ops

Add ndo_fcoe_enable/_disable to net_device_ops so the corresponding
HW can initialize itself for FCoE traffic or clean up after FCoE traffic is
done. This is expected to be called by the kernel FCoE stack upon receiving
a request for creating an FCoE instance on the corresponding netdev interface.
When implemented by the actual HW, the HW driver check the op code to perform
corresponding initialization or clean up for FCoE. The initialization normally
includes allocating extra queues for FCoE, setting corresponding HW registers
for FCoE, indicating FCoE offload features via netdev, etc. The clean-up would
include releasing the resources allocated for FCoE.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: convert bulk of drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:58 +0000 (19:50 +0000)]
netdev: convert bulk of drivers to netdev_tx_t

In a couple of cases collapse some extra code like:
   int retval = NETDEV_TX_OK;
   ...
   return retval;
into
   return NETDEV_TX_OK;

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agowireless: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:57 +0000 (19:50 +0000)]
wireless: convert drivers to netdev_tx_t

Mostly just simple conversions:
  * ray_cs had bogus return of NET_TX_LOCKED but driver
    was not using NETIF_F_LLTX
  * hostap and ipw2x00 had some code that returned value
    from a called function that also had to change to return netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoappletalk: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:56 +0000 (19:50 +0000)]
appletalk: convert drivers to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agointel: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:55 +0000 (19:50 +0000)]
intel: convert drivers to netdev_tx_t

Get rid of some bogus return wrapping as well.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years ago3com: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:54 +0000 (19:50 +0000)]
3com: convert drivers to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotulip: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:53 +0000 (19:50 +0000)]
tulip: convert drivers to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agouwb: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:52 +0000 (19:50 +0000)]
uwb: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: convert pseudo drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:51 +0000 (19:50 +0000)]
netdev: convert pseudo drivers to netdev_tx_t

These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoirda: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:50 +0000 (19:50 +0000)]
irda: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: convert pcmcia drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:49 +0000 (19:50 +0000)]
netdev: convert pcmcia drivers to netdev_tx_t

Update all the pcmcia network drivers for netdev_tx_t.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agohdlc: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:48 +0000 (19:50 +0000)]
hdlc: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agowan: convert drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:47 +0000 (19:50 +0000)]
wan: convert drivers to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotokenring: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:46 +0000 (19:50 +0000)]
tokenring: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agousbnet: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:45 +0000 (19:50 +0000)]
usbnet: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoisdn: convert to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:44 +0000 (19:50 +0000)]
isdn: convert to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoconvert hamradio drivers to netdev_txreturnt_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:43 +0000 (19:50 +0000)]
convert hamradio drivers to netdev_txreturnt_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoconvert ATM drivers to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:42 +0000 (19:50 +0000)]
convert ATM drivers to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: convert pseudo-devices to netdev_tx_t
Stephen Hemminger [Mon, 31 Aug 2009 19:50:41 +0000 (19:50 +0000)]
netdev: convert pseudo-devices to netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdev: change transmit to limited range type
Stephen Hemminger [Mon, 31 Aug 2009 19:50:40 +0000 (19:50 +0000)]
netdev: change transmit to limited range type

The transmit function should only return one of three possible values,
some drivers got confused and returned errno's or other values.
This changes the definition so that this can be caught at compile time.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io: Generate complete messages using single line DBG_PRINTs
Joe Perches [Tue, 25 Aug 2009 08:52:00 +0000 (08:52 +0000)]
s2io: Generate complete messages using single line DBG_PRINTs

Single line log messages should be emitted by a single call
where possible.

Converted multiple calls to DBG_PRINT to single call form.
Removed "s2io:" preface from DBG_PRINTs.

The DBG_PRINT macro now emits a log level and is surrounded by
a do {...} while (0)

All s2io log output is now prefaced with KBUILD_MODNAME ": "
via pr_fmt.

The DBG_PRINT macro should probably be converted to use the
dev_<level> form eventually.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Convert skipped nic->config.tx_cfg[i]. to tx_cfg->
Joe Perches [Mon, 24 Aug 2009 17:29:48 +0000 (17:29 +0000)]
s2io.c: Convert skipped nic->config.tx_cfg[i]. to tx_cfg->

Missed doing the conversion in earlier patch.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Standardize statistics accessors
Joe Perches [Mon, 24 Aug 2009 17:29:47 +0000 (17:29 +0000)]
s2io.c: Standardize statistics accessors

Regularize the declaration and uses of
struct config_param *config = &sp->config;
struct mac_info *mac_control = &sp->mac_control;
and use
struct stat_block *stats = mac_control->stats_info;
struct swStat *swstats = &stats->sw_stat;
struct xpakStat *xstats = &stats->xpak_stat;
and convert the longish uses like
nic->mac_control.stats_info->sw_stat.<foo>
to
swstats-><foo>
etc.

This also makes the statistics code marginally smaller
and presumably faster.

Old:
$ size s2io.o
   text    data     bss     dec     hex filename
 114289     516   33360  148165   242c5 s2io.o
New:
$ size s2io.o
   text    data     bss     dec     hex filename
 114097     516   33360  147973   24205 s2io.o

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: fix spelling explaination
Joe Perches [Mon, 24 Aug 2009 17:29:46 +0000 (17:29 +0000)]
s2io.c: fix spelling explaination

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: convert printks to pr_<level>
Joe Perches [Mon, 24 Aug 2009 17:29:45 +0000 (17:29 +0000)]
s2io.c: convert printks to pr_<level>

Fixed trivial typo as well

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Make more conforming to normal kernel style
Joe Perches [Mon, 24 Aug 2009 17:29:44 +0000 (17:29 +0000)]
s2io.c: Make more conforming to normal kernel style

Still has a few long lines.

checkpatch was:
total: 263 errors, 53 warnings, 8751 lines checked
is:
total: 4 errors, 35 warnings, 8767 lines checked

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: use kzalloc
Joe Perches [Mon, 24 Aug 2009 17:29:43 +0000 (17:29 +0000)]
s2io.c: use kzalloc

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Use calculated size in kmallocs
Joe Perches [Mon, 24 Aug 2009 17:29:42 +0000 (17:29 +0000)]
s2io.c: Use calculated size in kmallocs

Use consistent style.  Don't calculate the kmalloc size multiple times

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Shorten code line length by using intermediate pointers
Joe Perches [Mon, 24 Aug 2009 17:29:41 +0000 (17:29 +0000)]
s2io.c: Shorten code line length by using intermediate pointers

Repeated variable use and line wrapping is hard to read.
Use temp variables instead of direct references.

struct fifo_info *fifo = &mac_control->fifos[i];
struct ring_info *ring = &mac_control->rings[i];
struct tx_fifo_config *tx_cfg = &config->tx_cfg[i];
struct rx_ring_config *rx_cfg = &config->rx_cfg[i];

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agos2io.c: Use const for strings
Joe Perches [Mon, 24 Aug 2009 17:29:40 +0000 (17:29 +0000)]
s2io.c: Use const for strings

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopkt_sched: Fix resource limiting in pfifo_fast
Krishna Kumar [Mon, 31 Aug 2009 05:20:28 +0000 (22:20 -0700)]
pkt_sched: Fix resource limiting in pfifo_fast

pfifo_fast_enqueue has this check:
        if (skb_queue_len(list) < qdisc_dev(qdisc)->tx_queue_len) {

which allows each band to enqueue upto tx_queue_len skbs for a
total of 3*tx_queue_len skbs. I am not sure if this was the
intention of limiting in qdisc.

Patch compiled and 32 simultaneous netperf testing ran fine. Also:
# tc -s qdisc show dev eth2
qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 16835026752 bytes 373116 pkt (dropped 0, overlimits 0 requeues 25)
 rate 0bit 0pps backlog 0b 0p requeues 25

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: convert remaining non-symbolic return values in dev_queue_xmit
Krishna Kumar [Sat, 29 Aug 2009 20:21:36 +0000 (20:21 +0000)]
net: convert remaining non-symbolic return values in dev_queue_xmit

Patch compiled and 32 simultaneous netperf testing ran fine.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonetdevice: Consolidate to use existing macros where available.
Krishna Kumar [Sat, 29 Aug 2009 20:21:21 +0000 (20:21 +0000)]
netdevice: Consolidate to use existing macros where available.

Patch compiled and 32 simultaneous netperf testing ran fine.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocan: use correct NET_RX_ return values
Oliver Hartkopp [Sat, 29 Aug 2009 06:45:09 +0000 (06:45 +0000)]
can: use correct NET_RX_ return values

Dropped skb's should be documented by an appropriate return value.
Use the correct NET_RX_DROP and NET_RX_SUCCESS values for that reason.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoWAN: bit and/or confusion
roel kluin [Thu, 20 Aug 2009 04:04:40 +0000 (04:04 +0000)]
WAN: bit and/or confusion

Fix the tests that check whether Frame* bits are not set

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoucc_geth: Implement suspend/resume and Wake-On-LAN support
Anton Vorontsov [Thu, 27 Aug 2009 07:35:57 +0000 (07:35 +0000)]
ucc_geth: Implement suspend/resume and Wake-On-LAN support

This patch implements suspend/resume and WOL support for UCC Ethernet
driver.

We support two wake up events: wake on PHY/link changes and wake
on magic packet.

In some CPUs (like MPC8569) QE shuts down during sleep, so magic packet
detection is unusable, and also on resume we should fully reinitialize
UCC structures.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoucc_geth: Remove UGETH_MAGIC_PACKET Kconfig symbol and code
Anton Vorontsov [Thu, 27 Aug 2009 07:35:56 +0000 (07:35 +0000)]
ucc_geth: Remove UGETH_MAGIC_PACKET Kconfig symbol and code

This patch removes currently unused UGETH_MAGIC_PACKET Kconfig symbol
and code, i.e. magic_packet_detection_{enable,disable} functions.

The two functions each contain just two steps that we'll place into
suspend/resume code path under CONFIG_PM.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoucc_geth: Factor out MAC initialization steps into a call
Anton Vorontsov [Thu, 27 Aug 2009 07:35:54 +0000 (07:35 +0000)]
ucc_geth: Factor out MAC initialization steps into a call

This patch factors out MAC initialization into ucc_geth_init_mac()
function that we'll use for suspend/resume.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopowerpc/qe: Implement qe_alive_during_sleep() helper function
Anton Vorontsov [Thu, 27 Aug 2009 07:35:50 +0000 (07:35 +0000)]
powerpc/qe: Implement qe_alive_during_sleep() helper function

In some CPUs (i.e. MPC8569) QE shuts down completely during sleep,
drivers may want to know that to reinitialize registers and buffer
descriptors.

This patch implements qe_alive_during_sleep() helper function, so far
it just checks if MPC8569-compatible power management controller is
present, which is a sign that QE turns off during sleep.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoucc_geth: Fix NULL pointer dereference in uec_get_ethtool_stats()
Anton Vorontsov [Thu, 27 Aug 2009 07:35:47 +0000 (07:35 +0000)]
ucc_geth: Fix NULL pointer dereference in uec_get_ethtool_stats()

In commit 3e73fc9a12679a546284d597c1f19165792d0b83 ("ucc_geth: Fix IO
memory (un)mapping code") I fixed ug_regs IO memory leak by properly
freeing the allocated memory. But ethtool_stats() callback doesn't
check for ug_regs being NULL, and that causes following oops if
'ethtool -S' is executed on a closed eth device:

  Unable to handle kernel paging request for data at address 0x00000180
  Faulting instruction address: 0xc0208228
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [c0208228] uec_get_ethtool_stats+0x38/0x140
  LR [c02559a0] ethtool_get_stats+0xf8/0x23c
  Call Trace:
  [ef87bcd0] [c025597c] ethtool_get_stats+0xd4/0x23c (unreliable)
  [ef87bd00] [c025706c] dev_ethtool+0xfe8/0x11bc
  [ef87be00] [c0252b5c] dev_ioctl+0x454/0x6a8
  ...
  ---[ end trace 77fff1162a9586b0 ]---
  Segmentation fault

This patch fixes the issue.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluet...
David S. Miller [Mon, 31 Aug 2009 04:30:39 +0000 (21:30 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/holtmann/bluetooth-next-2.6

15 years agotg3: Update version to 3.101
Matt Carlson [Fri, 28 Aug 2009 14:03:44 +0000 (14:03 +0000)]
tg3: Update version to 3.101

This patch updates the tg3 version to 3.101.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Move per-int tx members to a per-int struct
Matt Carlson [Fri, 28 Aug 2009 14:03:21 +0000 (14:03 +0000)]
tg3: Move per-int tx members to a per-int struct

This patch moves the tx_prod, tx_cons, tx_pending, tx_ring, and
tx_buffers transmit ring device members to a per-interrupt structure.
It also adds a new transmit producer mailbox member (prodmbox) and
converts the code to use it rather than a preprocessor constant.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Move per-int rx members to per-int struct
Matt Carlson [Fri, 28 Aug 2009 14:03:01 +0000 (14:03 +0000)]
tg3: Move per-int rx members to per-int struct

This patch moves the rx_rcb, rx_rcb_mapping, and rx_rcb_ptr return ring
device members to a per-interrupt structure.  It also adds a new return
ring consumer mailbox register member (consmbox) and converts the code
to use it rather than a preprocessor constant.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Move general int members to a per-int struct
Matt Carlson [Fri, 28 Aug 2009 14:02:40 +0000 (14:02 +0000)]
tg3: Move general int members to a per-int struct

This patch moves the last_tag, last_tag_irq, and hw_status device
members to a per-interrupt structure.  It also adds a new interrupt
mailbox member (int_mbox) and converts the code to use it rather than a
direct preprocessor constant.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Convert napi handlers to use tnapi
Matt Carlson [Fri, 28 Aug 2009 14:02:18 +0000 (14:02 +0000)]
tg3: Convert napi handlers to use tnapi

This patch converts the napi interrupt handler functions to accept and
use tg3_napi structures.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Convert ISR parameter to tnapi
Matt Carlson [Fri, 28 Aug 2009 14:01:57 +0000 (14:01 +0000)]
tg3: Convert ISR parameter to tnapi

This patch migrates the ISR parameter from struct net_device to struct
tg3_napi.  Checkpatch complains about the existence of the preexisting
IRQF_SAMPLE_RANDOM flag.  I've opted to keep this patch conservative and
let it continue to exist until the flag gets officially purged from the
kernel.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Move napi to per-int struct
Matt Carlson [Fri, 28 Aug 2009 14:01:37 +0000 (14:01 +0000)]
tg3: Move napi to per-int struct

This patch creates a per-interrupt data structure, moves the napi
member over, and creates a tg3 pointer back to the device structure.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Cleanup interrupt setup / teardown
Matt Carlson [Fri, 28 Aug 2009 14:01:15 +0000 (14:01 +0000)]
tg3: Cleanup interrupt setup / teardown

Later patches will be adding MSIX support, which will complicate
interrupt initialization.  This patch prepares for the integration by
breaking out the interrupt setup and teardown code into separate
functions and cleaning up the error return paths.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Use ext rx bds
Matt Carlson [Fri, 28 Aug 2009 14:00:55 +0000 (14:00 +0000)]
tg3: Use ext rx bds

The 5717 only uses extended buffer descriptors for the jumbo producer
ring.  Extended buffer descriptors are available on all devices that
support a separate jumbo producer ring so make the change universal.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Create a new prodring_set structure
Matt Carlson [Fri, 28 Aug 2009 14:00:25 +0000 (14:00 +0000)]
tg3: Create a new prodring_set structure

This patch migrates most of the rx producer ring variables to a new
tg3_rx_prodring_set structure and modifies the code accordingly.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Create rx producer ring setup routines
Matt Carlson [Fri, 28 Aug 2009 13:59:57 +0000 (13:59 +0000)]
tg3: Create rx producer ring setup routines

Later patches are going to complicate the ring initialization routines.
This patch breaks out the setup and teardown of the rx producer rings
into separate functions to make the code more readable.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Clarify rx buffer relationships
Matt Carlson [Fri, 28 Aug 2009 13:58:46 +0000 (13:58 +0000)]
tg3: Clarify rx buffer relationships

This patch attempts to document the various rx buffer sizes used by the
driver and how they relate to each other.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Move the JUMBO_CAPABLE and SUPPORT_MSI flags
Matt Carlson [Fri, 28 Aug 2009 13:58:24 +0000 (13:58 +0000)]
tg3: Move the JUMBO_CAPABLE and SUPPORT_MSI flags

This patch moves where the jumbo capable and msi support flags are
located.  This is prep work for the addition of msix support flags.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Break out mini producer ring handling
Matt Carlson [Fri, 28 Aug 2009 13:57:12 +0000 (13:57 +0000)]
tg3: Break out mini producer ring handling

This patch separates the code that sets up the mini producer ring from
the code that sets up the jumbo producer rings.  The 5717 asic rev
devices do not have a mini ring, but do have a jumbo frame
implementation similar to the 5704 and previous devices.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Reformat NVRAM case statements
Matt Carlson [Fri, 28 Aug 2009 12:29:16 +0000 (12:29 +0000)]
tg3: Reformat NVRAM case statements

This patch fixes up the NVRAM detection switch statements to conform
to the kernel coding style.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Add new 5785 10/100 only device ID
Matt Carlson [Fri, 28 Aug 2009 12:28:45 +0000 (12:28 +0000)]
tg3: Add new 5785 10/100 only device ID

This patch adds a new device ID for those 5785 devices that will only
use 10/100 phys.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotg3: Delay mdio bus init until fw finishes
Matt Carlson [Fri, 28 Aug 2009 12:27:50 +0000 (12:27 +0000)]
tg3: Delay mdio bus init until fw finishes

The device firmware uses the MDIO bus during early setup.  If the driver
modifies the MDIO bus configuration while it is in use by the firmware,
any number of bad things can happen.  This patch delays MDIO setup until
after the firmware posts its magic signature, signifying initialization
is complete.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotipc: fix test of bearer_priority range in tipc_register_media()
roel kluin [Thu, 27 Aug 2009 02:03:15 +0000 (02:03 +0000)]
tipc: fix test of bearer_priority range in tipc_register_media()

For the bearer_priority to be less than TIPC_MIN_LINK_PRI and greater than
TIPC_MAX_LINK_PRI is logically impossible.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocan: switch to seq_file
Alexey Dobriyan [Fri, 28 Aug 2009 09:57:21 +0000 (09:57 +0000)]
can: switch to seq_file

create_proc_read_entry() is going to be removed soon.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: sh_eth: add value of ether_link pin in platform_data
Yoshihiro Shimoda [Thu, 27 Aug 2009 23:25:03 +0000 (23:25 +0000)]
net: sh_eth: add value of ether_link pin in platform_data

The method of ETHER_LINK pin is board dependence.
This patch adding paramters are:
 - no_ether_link          : If set to 1, do not use ETHER_LINK
 - ether_link_active_low  : If set to 1, ETHER_LINK is active low.

Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoTI DaVinci EMAC: delay DaVinci EMAC initialization
Rajashekhara, Sudhakar [Wed, 19 Aug 2009 10:39:55 +0000 (10:39 +0000)]
TI DaVinci EMAC: delay DaVinci EMAC initialization

On TI's DA850/OMAP-L138 EVM, MAC address is stored in SPI
flash which is accessed using MTD interface.

This patch delays the initialization of DaVinci EMAC driver
by changing module_init to late_initcall. This helps SPI and
MTD drivers to get initialized before EMAC thereby enabling
EMAC driver to read the MAC address while booting and use it.

Tested with NFS on DM644x, DM6467, DA830/OMAP-L137 and
DA850/OMAP-L138 EVMs.

Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
Reviewed-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoWAN/LMC: Fix type_trans().
Krzysztof Halasa [Wed, 19 Aug 2009 23:56:20 +0000 (23:56 +0000)]
WAN/LMC: Fix type_trans().

Fix lmc_proto_type() invocation.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agolib/vsprintf.c: Add "%pI6c" - print pointer as compressed ipv6 address
Joe Perches [Mon, 17 Aug 2009 12:29:44 +0000 (12:29 +0000)]
lib/vsprintf.c: Add "%pI6c" - print pointer as compressed ipv6 address

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Jens Rosenboom <jens@mcbone.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotcp: Remove redundant copy of MD5 authentication key
John Dykstra [Wed, 19 Aug 2009 09:47:41 +0000 (09:47 +0000)]
tcp: Remove redundant copy of MD5 authentication key

Remove the copy of the MD5 authentication key from tcp_check_req().
This key has already been copied by tcp_v4_syn_recv_sock() or
tcp_v6_syn_recv_sock().

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoSpeed-up pfifo_fast lookup using a private bitmap
Krishna Kumar [Tue, 18 Aug 2009 21:55:59 +0000 (21:55 +0000)]
Speed-up pfifo_fast lookup using a private bitmap

Maintain a per-qdisc bitmap for pfifo_fast giving  availability
of skbs for each band. This allows faster lookup for a skb when
there are no high priority skbs. Also, it helps in (rare) cases
when there are no skbs on the list, where an immediate lookup is
faster than iterating through the three bands.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv6: Update Neighbor Cache when IPv6 RA is received on a router
David Ward [Sat, 29 Aug 2009 07:04:09 +0000 (00:04 -0700)]
ipv6: Update Neighbor Cache when IPv6 RA is received on a router

When processing a received IPv6 Router Advertisement, the kernel
creates or updates an IPv6 Neighbor Cache entry for the sender --
but presently this does not occur if IPv6 forwarding is enabled
(net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
cases processing of the Router Advertisement has already halted.

This patch allows the Neighbor Cache to be updated in these cases,
while still avoiding any modification to routes or link parameters.

This continues to satisfy RFC 4861, since any entry created in the
Neighbor Cache as the result of a received Router Advertisement is
still placed in the STALE state.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agobnx2: Update firmware to 5.0.0.j3.
Michael Chan [Sat, 29 Aug 2009 07:02:46 +0000 (00:02 -0700)]
bnx2: Update firmware to 5.0.0.j3.

- Better small packet receive performance.
- Better handling of Flow control on 5709.
- Fixed iSCSI TMP ABORT TASK problem.
- Added iSCSI TCP timestamp option.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotcp: fix premature termination of FIN_WAIT2 time-wait sockets
Octavian Purdila [Sat, 29 Aug 2009 07:00:35 +0000 (00:00 -0700)]
tcp: fix premature termination of FIN_WAIT2 time-wait sockets

There is a race condition in the time-wait sockets code that can lead
to premature termination of FIN_WAIT2 and, subsequently, to RST
generation when the FIN,ACK from the peer finally arrives:

Time     TCP header
0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0

Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
instance inet_twdr_do_twkill_work returns 1. At that point we will
mark slot 1 and schedule inet_twdr_twkill_work. We will also make
twdr->slot = 2.

Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
is called which will create a new FIN_WAIT2 time-wait socket and will
place it in the last to be reached slot, i.e. twdr->slot = 1.

At this point say inet_twdr_twkill_work will run which will start
destroying the time-wait sockets in slot 1, including the just added
TCP_FIN_WAIT2 one.

To avoid this issue we increment the slot only if all entries in the
slot have been purged.

This change may delay the slots cleanup by a time-wait death row
period but only if the worker thread didn't had the time to run/purge
the current slot in the next period (6 seconds with default sysctl
settings). However, on such a busy system even without this change we
would probably see delays...

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agofib_trie: resize rework
Jens Låås [Sat, 29 Aug 2009 06:57:15 +0000 (23:57 -0700)]
fib_trie: resize rework

Here is rework and cleanup of the resize function.

Some bugs we had. We were using ->parent when we should use
node_parent(). Also we used ->parent which is not assigned by
inflate in inflate loop.

Also a fix to set thresholds to power 2 to fit halve
and double strategy.

max_resize is renamed to max_work which better indicates
it's function.

Reaching max_work is not an error, so warning is removed.
max_work only limits amount of work done per resize.
(limits CPU-usage, outstanding memory etc).

The clean-up makes it relatively easy to add fixed sized
root-nodes if we would like to decrease the memory pressure
on routers with large routing tables and dynamic routing.
If we'll need that...

Its been tested with 280k routes.

Work done together with Robert Olsson.

Signed-off-by: Jens Låås <jens.laas@its.uu.se>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agosit: allow ip fragmentation when using nopmtudisc to fix package loss
Sascha Hlusiak [Sat, 29 Aug 2009 06:53:53 +0000 (23:53 -0700)]
sit: allow ip fragmentation when using nopmtudisc to fix package loss

if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
will be performed by deriving the mtu from the ipv4 link and setting the
DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
using the new and lower mtu and everyone is happy.

If the frag_off parameter is unset, the mtu for the tunnel will be derived
from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
pmtu. In that case we must allow the fragmentation of the IPv4 packet because
the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
frequent icmp_frag_needed and package loss on the IPv6 layer.

This patch allows fragmentation when tunnel was created with parameter
nopmtudisc, like in ipip/gre tunnels.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agonet: ip_rt_send_redirect() optimization
Eric Dumazet [Sat, 29 Aug 2009 06:52:01 +0000 (23:52 -0700)]
net: ip_rt_send_redirect() optimization

While doing some forwarding benchmarks, I noticed
ip_rt_send_redirect() is rather expensive, even if send_redirects is
false for the device.

Fix is to avoid two atomic ops, we dont really need to take a
reference on in_dev

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agotcp: keepalive cleanups
Eric Dumazet [Sat, 29 Aug 2009 06:48:54 +0000 (23:48 -0700)]
tcp: keepalive cleanups

Introduce keepalive_probes(tp) helper, and use it, like
keepalive_time_when(tp) and keepalive_intvl_when(tp)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocnic: Put uio init in separate function.
Michael Chan [Wed, 26 Aug 2009 09:49:23 +0000 (09:49 +0000)]
cnic: Put uio init in separate function.

This will allow the 10G iSCSI code to reuse the function.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocnic: Put rx/tx ring allocation in separate function.
Michael Chan [Wed, 26 Aug 2009 09:49:22 +0000 (09:49 +0000)]
cnic: Put rx/tx ring allocation in separate function.

This will allow the 10G iSCSI code to reuse the function.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoipv4: af_inet.c cleanups
Eric Dumazet [Sat, 29 Aug 2009 06:45:21 +0000 (23:45 -0700)]
ipv4: af_inet.c cleanups

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: use proc_create_data()
Alexey Dobriyan [Sat, 29 Aug 2009 06:34:43 +0000 (23:34 -0700)]
pktgen: use proc_create_data()

It looks like after rename device proc entry is unusable,
because of no ->read_proc or ->proc_fops.

And create_proc_entry() is deprecated.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: increase version
Stephen Hemminger [Thu, 27 Aug 2009 13:55:20 +0000 (13:55 +0000)]
pktgen: increase version

Increase module version, and cleanup module info.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: cleanup checkpatch warnings
Stephen Hemminger [Thu, 27 Aug 2009 13:55:19 +0000 (13:55 +0000)]
pktgen: cleanup checkpatch warnings

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: use common idle routine
Stephen Hemminger [Thu, 27 Aug 2009 13:55:18 +0000 (13:55 +0000)]
pktgen: use common idle routine

Simpler to have one place that spins and accounts for delays,
this will also make the last packet be detected faster for more
repeatable timing.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: spin using hrtimer
Stephen Hemminger [Sat, 29 Aug 2009 06:41:29 +0000 (23:41 -0700)]
pktgen: spin using hrtimer

This changes how the pktgen thread spins/waits between
packets if delay is configured. It uses a high res timer to
wait for time to arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: convert to use ktime_t
Stephen Hemminger [Thu, 27 Aug 2009 13:55:16 +0000 (13:55 +0000)]
pktgen: convert to use ktime_t

The kernel ktime_t is a nice generic infrastructure for mananging
high resolution times, as is done in pktgen.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: avoid calling gettimeofday
Stephen Hemminger [Thu, 27 Aug 2009 13:55:15 +0000 (13:55 +0000)]
pktgen: avoid calling gettimeofday

If not using delay then no need to update next_tx after
each packet sent. This allows pktgen to send faster especially
on systems with slower clock sources.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: reorganize transmit loop
Stephen Hemminger [Thu, 27 Aug 2009 13:55:14 +0000 (13:55 +0000)]
pktgen: reorganize transmit loop

Handle standard (and non-standard) return values in a switch.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: use netdev_alloc_skb
Stephen Hemminger [Thu, 27 Aug 2009 13:55:13 +0000 (13:55 +0000)]
pktgen: use netdev_alloc_skb

netdev_alloc_skb is NUMA node aware.
Also, don't exhaust atomic emergency pool. Don't want pktgen
to cause OOM behaviour.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: cleanup clone count test
Stephen Hemminger [Thu, 27 Aug 2009 13:55:12 +0000 (13:55 +0000)]
pktgen: cleanup clone count test

The if statement to test for "should a new packet be used"
can be simplified.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: xmit logic reorganization
Stephen Hemminger [Thu, 27 Aug 2009 13:55:11 +0000 (13:55 +0000)]
pktgen: xmit logic reorganization

Do some reorganization of transmit logic path:
   * move transmit queue full idle to separate routine
   * add a cpu_relax()
   * eliminate some of the uneeded goto's
   * if queue is still stopped, go back to main thread loop.
   * don't give up transmitting if quantum is exhausted (be greedy)

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: stop_device cleanup
Stephen Hemminger [Thu, 27 Aug 2009 13:55:10 +0000 (13:55 +0000)]
pktgen: stop_device cleanup

All the callers were freeing skb after stopping device.
Remove unneeded forward decl.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: mark read-only/mostly variables
Stephen Hemminger [Thu, 27 Aug 2009 13:55:09 +0000 (13:55 +0000)]
pktgen: mark read-only/mostly variables

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agopktgen: change inlining
Stephen Hemminger [Thu, 27 Aug 2009 13:55:08 +0000 (13:55 +0000)]
pktgen: change inlining

Don't force inlining where not needed. Gcc does better job
of deciding to inline local functions.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>