Stephen Hemminger [Sun, 8 Jul 2007 05:59:14 +0000 (22:59 -0700)]
[NET]: netdevice locking assumptions documentation
Update the documentation about locking assumptions.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sun, 8 Jul 2007 05:54:56 +0000 (22:54 -0700)]
[BNX2]: Seems to not need net/tcp.h
Got bored to always recompile it for no reason.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:52:37 +0000 (22:52 -0700)]
[BNX2]: Update version to 1.6.2.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:52:02 +0000 (22:52 -0700)]
[BNX2]: Print management firmware version.
Add management firmware version for ethtool -i.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:51:36 +0000 (22:51 -0700)]
[BNX2]: Enhance the heartbeat.
In addition to the periodic heartbeat, we're adding a heartbeat
request interrupt when the heartbeat is late. This is needed during
netpoll where the timer is not available. -rt kernels will also
benefit since the timer is not as accurate.
[ We discussed this patch last time and we decided that the -rt
kernel problem alone did not justify this patch. I think the
netpoll problem makes this patch necessary. ]
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:51:03 +0000 (22:51 -0700)]
[BNX2]: Reduce spurious INTA interrupts.
Spurious interrupts are often encountered especially on systems
using the 8259 PIC mode. This is because the I/O write to deassert
the interrupt is posted and won't get to the chip immediately. As
a result, the IRQ may remain asserted after the IRQ handler exits,
causing spurious interrupts.
Add read back to flush the I/O write to deassert the IRQ immediately.
We also store the last_status_idx immediately in the IRQ handler to
help detect whether the interrupt is ours or not when the IRQ is
entered again before ->poll gets called.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:50:37 +0000 (22:50 -0700)]
[BNX2]: Modify link up message.
Modify the link up dmesg to report remote copper or Serdes link.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:50:15 +0000 (22:50 -0700)]
[BNX2]: Add ethtool support for remote PHY.
Modify the driver's ethtool_ops->get_settings and set_settings
functions to support remote PHY. Users control the remote copper
PHY settings by specifying link settings for the tp (twisted pair)
port.
The nway_reset function is also modified to support remote PHY.
mii-tool operations are not supported on remote PHY and we will
return -EOPNOTSUPP.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:49:43 +0000 (22:49 -0700)]
[BNX2]: Add support for remote PHY.
In blade servers, the Serdes PHY in 5708S can control the remote
copper PHY through autonegotiation on the backplane. This patch adds
the logic to interface with the firmware to control the remote PHY
autonegotiation and to handle remote PHY link events.
When remote PHY is present, the 5708S Serdes device practically
becomes a copper device with full control over the 1000Base-T
link settings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:48:31 +0000 (22:48 -0700)]
[BNX2]: Add remote PHY bit definitions.
Add new fields in struct bnx2 and other bit definitions in shared
memory to support remote PHY.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 8 Jul 2007 05:48:00 +0000 (22:48 -0700)]
[BNX2]: Add bnx2_set_default_link().
Put existing code to setup the default link settings in this new
function. This makes it easier to support the remote PHY feature in
the next few patches.
Also change ETHTOOL_ALL_FIBRE_SPEED to include 2500Mbps if supported.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Balazs Scheidler [Sun, 8 Jul 2007 05:41:01 +0000 (22:41 -0700)]
[NETFILTER]: x_tables: add more detail to error message about match/target mask mismatch
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:40:26 +0000 (22:40 -0700)]
[NETFILTER]: nf_queue: Use RCU and mutex for queue handlers
Queue handlers are registered/unregistered in only process context.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:40:08 +0000 (22:40 -0700)]
[NETFILTER]: nfnetlink_queue: don't unregister handler of other subsystem
The queue handlers registered by ip[6]_queue.ko at initialization should
not be unregistered according to requests from userland program
using nfnetlink_queue. If we allow that, there is no way to register
the handlers of built-in ip[6]_queue again.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:39:38 +0000 (22:39 -0700)]
[NETFILTER]: Convert DEBUGP to pr_debug
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:39:16 +0000 (22:39 -0700)]
[NETFILTER]: xt_helper: use RCU
The ->helper pointer is protected by RCU, no need to take
nf_conntrack_lock. Also remove excessive debugging.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:38:54 +0000 (22:38 -0700)]
[NETFILTER]: nf_conntrack_h323: turn some printks into DEBUGPs
Don't spam the ringbuffer with decoding errors. The only printks remaining
are for dropped packets when we're certain they are H.323.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:38:30 +0000 (22:38 -0700)]
[NETFILTER]: ipt_CLUSTERIP: add compat code
Adjust structure size and don't expect pointers passed in from
userspace to be valid. Also replace an enum in an ABI structure
by a fixed size type.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:38:07 +0000 (22:38 -0700)]
[NETFILTER]: ipt_SAME: add to feature-removal-schedule
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:37:38 +0000 (22:37 -0700)]
[NETFILTER]: nf_conntrack: early_drop improvement
When the maximum number of conntrack entries is reached and a new
one needs to be allocated, conntrack tries to drop an unassured
connection from the same hash bucket the new conntrack would hash
to. Since with a properly sized hash the average number of entries
per bucket is 1, the chances of actually finding one are not very
good. This patch makes it walk the hash until a minimum number of
8 entries are checked.
Based on patch by Vasily Averin <vvs@sw.ru>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:37:03 +0000 (22:37 -0700)]
[NETFILTER]: nf_conntrack: mark helpers __read_mostly
Most are __read_mostly already, this changes the remaining ones.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:36:46 +0000 (22:36 -0700)]
[NETFILTER]: nf_conntrack_helper: use hashtable for conntrack helpers
Eliminate the last global list searched for every new connection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:36:24 +0000 (22:36 -0700)]
[NETFILTER]: nf_conntrack_expect: introduce nf_conntrack_expect_max sysct
As a last step of preventing DoS by creating lots of expectations, this
patch introduces a global maximum and a sysctl to control it. The default
is initialized to 4 * the expectation hash table size, which results in
1/64 of the default maxmimum of conntracks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:35:56 +0000 (22:35 -0700)]
[NETFILTER]: nf_conntrack_expect: maintain per conntrack expectation list
This patch brings back the per-conntrack expectation list that was
removed around 2.6.10 to avoid walking all expectations on expectation
eviction and conntrack destruction.
As these were the last users of the global expectation list, this patch
also kills that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:35:21 +0000 (22:35 -0700)]
[NETFILTER]: nf_conntrack_helper/nf_conntrack_netlink: convert to expectation hash
Convert from the global expectation list to the hash table.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:34:07 +0000 (22:34 -0700)]
[NETFILTER]: nf_conntrack_expect: convert proc functions to hash
Convert from the global expectation list to the hash table.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:33:47 +0000 (22:33 -0700)]
[NETFILTER]: nf_conntrack: use hashtable for expectations
Currently all expectations are kept on a global list that
- needs to be searched for every new conncetion
- needs to be walked for evicting expectations when a master connection
has reached its limit
- needs to be walked on connection destruction for connections that
have open expectations
This is obviously not good, especially when considering helpers like
H.323 that register *lots* of expectations and can set up permanent
expectations, but it also allows for an easy DoS against firewalls
using connection tracking helpers.
Use a hashtable for expectations to avoid incurring the search overhead
for every new connection. The default hash size is 1/256 of the conntrack
hash table size, this can be overriden using a module parameter.
This patch only introduces the hash table for expectation lookups and
keeps other users to reduce the noise, the following patches will get
rid of it completely.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:32:53 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack: move expectaton related init code to nf_conntrack_expect.c
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:32:34 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack_netlink: sync expectation dumping with conntrack table dumping
Resync expectation table dumping code with conntrack dumping: don't
rely on the unique ID anymore since that requires to walk the list
backwards, which doesn't work with the upcoming conversion to hlists.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:32:03 +0000 (22:32 -0700)]
[NETFILTER]: nf_conntrack_expect: avoid useless list walking
Don't walk the list when unexpecting an expectation, we already
have a reference and the timer check is enough to guarantee
that it still is on the list.
This comment suggests that it was copied there by mistake from
expectation eviction:
/* choose the oldest expectation to evict */
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:31:32 +0000 (22:31 -0700)]
[NETFILTER]: nf_conntrack: reduce masks to a subset of tuples
Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.
This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:31:07 +0000 (22:31 -0700)]
[NETFILTER]: nf_conntrack_ftp: use nf_ct_expect_init
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:30:49 +0000 (22:30 -0700)]
[NETFILTER]: nf_conntrack_expect: function naming unification
Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...
Consistently use nf_ct_ as prefix for exported functions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:30:27 +0000 (22:30 -0700)]
[NETFILTER]: nf_nat: use hlists for bysource hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:30:08 +0000 (22:30 -0700)]
[NETFILTER]: nf_conntrack: export hash allocation/destruction functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:28:42 +0000 (22:28 -0700)]
[NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get
All callers pass NULL, this also doesn't seem very useful for modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:28:14 +0000 (22:28 -0700)]
[NETFILTER]: nf_conntrack: use hlists for conntrack hash
Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.
One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:27:33 +0000 (22:27 -0700)]
[NETFILTER]: nf_conntrack: round up hashsize to next multiple of PAGE_SIZE
Don't let the rest of the page go to waste.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:27:06 +0000 (22:27 -0700)]
[NETFILTER]: nf_conntrack_extend: use __read_mostly for struct nf_ct_ext_type
Also make them static.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:26:35 +0000 (22:26 -0700)]
[NETFILTER]: nf_nat: merge nf_conn and nf_nat_info
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:26:16 +0000 (22:26 -0700)]
[NETFILTER]: nf_nat: kill global 'destroy' operation
This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:25:51 +0000 (22:25 -0700)]
[NETFILTER]: nf_conntrack: remove old memory allocator of conntrack
Now memory space for help and NAT are allocated by extension
infrastructure.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:25:28 +0000 (22:25 -0700)]
[NETFILTER]: nf_nat: remove unused nf_nat_module_is_loaded
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:24:28 +0000 (22:24 -0700)]
[NETFILTER]: nf_nat: use extension infrastructure
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:24:04 +0000 (22:24 -0700)]
[NETFILTER]: nf_nat: add reference to conntrack from entry of bysource list
I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:23:42 +0000 (22:23 -0700)]
[NETFILTER]: nf_conntrack: use extension infrastructure for helper
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:23:21 +0000 (22:23 -0700)]
[NETFILTER]: nf_conntrack: introduce extension infrastructure
Old space allocator of conntrack had problems about extensibility.
- It required slab cache per combination of extensions.
- It expected what extensions would be assigned, but it was impossible
to expect that completely, then we allocated bigger memory object than
really required.
- It needed to search helper twice due to lock issue.
Now basic informations of a connection are stored in 'struct nf_conn'.
And a storage for extension (helper, NAT) is allocated by kmalloc.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:22:33 +0000 (22:22 -0700)]
[NETFILTER]: nf_nat: move NAT declarations from nf_conntrack_ipv4.h to nf_nat.h
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 8 Jul 2007 05:22:02 +0000 (22:22 -0700)]
[NETFILTER]: x_tables: mark matches and targets __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jozsef Kadlecsik [Sun, 8 Jul 2007 05:21:23 +0000 (22:21 -0700)]
[NETFILTER]: x_tables: add TRACE target
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:20:36 +0000 (22:20 -0700)]
[NETFILTER]: Add u32 match
Along comes... xt_u32, a revamped ipt_u32 from POM-NG,
Plus:
* 2007-06-02: added ipv6 support
* 2007-06-05: uses kmalloc for the big buffer
* 2007-06-05: added inversion
* 2007-06-20: use skb_copy_bits() and get rid of the big buffer
and lock (suggested by Pablo Neira Ayuso)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jerome Borsboom [Sun, 8 Jul 2007 05:19:48 +0000 (22:19 -0700)]
[NETFILTER]: nf_nat_sip: only perform RTP DNAT if SIP session was SNATed
DNAT of the the RTP session is only necessary if the SIP session has
been SNATed.
Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:19:08 +0000 (22:19 -0700)]
[NETFILTER]: Remove redundant parentheses/braces
Removes redundant parentheses and braces (And add one pair in a
xt_tcpudp.c macro).
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:17:36 +0000 (22:17 -0700)]
[NETFILTER]: Remove incorrect inline markers
device_cmp: the function's address is taken (call to nf_ct_iterate_cleanup)
alloc_null_binding: referenced externally
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:16:55 +0000 (22:16 -0700)]
[NETFILTER]: add some consts, remove some casts
Make a number of variables const and/or remove unneeded casts.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:16:26 +0000 (22:16 -0700)]
[NETFILTER]: x_tables: switch xt_target->checkentry to bool
Switch the return type of target checkentry functions to boolean.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:16:00 +0000 (22:16 -0700)]
[NETFILTER]: x_tables: switch xt_match->checkentry to bool
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:15:35 +0000 (22:15 -0700)]
[NETFILTER]: x_tables: switch xt_match->match to bool
Switch the return type of match functions to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 8 Jul 2007 05:15:12 +0000 (22:15 -0700)]
[NETFILTER]: x_tables: switch hotdrop to bool
Switch the "hotdrop" variables to boolean
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 8 Jul 2007 05:14:23 +0000 (22:14 -0700)]
[NETFILTER]: ip6_tables: fix explanation of valid upper protocol number
This explains the allowed upper protocol numbers. IP6T_F_NOPROTO was
introduced to use 0 as Hop-by-Hop option header, not wildcard. But that
seemed to be forgotten. 0 has been used as wildcard since 2002-08-23.
Signed-off-by: Yasuyuki Kozakai <yasuyuki@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jing Min Zhao [Sun, 8 Jul 2007 05:13:17 +0000 (22:13 -0700)]
[NETFILTER]: nf_conntrack_h323: check range first in sequence extension
Check range before checking STOP flag. This optimization may save a
nanosecond or less :)
Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Fri, 6 Jul 2007 00:08:05 +0000 (17:08 -0700)]
[UDP]: Cleanup UDP encapsulation code
This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.
Make xfrm4_rcv_encap() static since it is no longer called externally.
Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
G. Liakhovetski [Tue, 3 Jul 2007 05:56:57 +0000 (22:56 -0700)]
[IrDA]: tsap init routine factorisation.
This patch extracts common code from irttp_open_tsap() and irttp_dup()
into a new function to 1) avoid code duplication, 2) help avoid
forgetting object initialization in the tsap duplication path in the
future.
Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz [Tue, 3 Jul 2007 05:56:15 +0000 (22:56 -0700)]
[IrDA]: kingsun-sir.c charset fix.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz [Tue, 3 Jul 2007 05:55:31 +0000 (22:55 -0700)]
[IrDA]: Monitor mode.
Through the IrDA netlink set mode command, we switch to IrDA monitor
mode, where one IrLAP instance receives all the packets on the media,
without ever responding to them.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz [Tue, 3 Jul 2007 05:54:18 +0000 (22:54 -0700)]
[IrDA]: Netlink layer.
First IrDA configuration netlink layer implementation.
Currently, we only support the set/get mode commands.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guido Guenther [Tue, 3 Jul 2007 05:50:25 +0000 (22:50 -0700)]
[NET]: Allow group ownership of TUN/TAP devices.
Introduce a new syscall TUNSETGROUP for group ownership setting of tap
devices. The user now is allowed to send packages if either his euid or
his egid matches the one specified via tunctl (via -u or -g
respecitvely). If both, gid and uid, are set via tunctl, both have to
match.
Signed-off-by: Guido Guenther <agx@sigxcpu.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 3 Jul 2007 05:49:07 +0000 (22:49 -0700)]
[NET_SCHED]: Remove unnecessary includes
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 3 Jul 2007 05:48:13 +0000 (22:48 -0700)]
[NET_SCHED]: sch_htb: use generic estimator
Use the generic estimator instead of reimplementing (parts of) it.
For compatibility always create a default estimator for new classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 3 Jul 2007 05:47:37 +0000 (22:47 -0700)]
[NET_SCHED]: Remove unnecessary stats_lock pointers
Remove stats_lock pointers from qdisc-internal structures, in all cases
it points to dev->queue_lock. The only case where it is necessary is for
top-level qdiscs, where it might also point to dev->ingress_lock in case
of the ingress qdisc. Also remove it from actions completely, it always
points to the actions internal lock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 3 Jul 2007 05:46:07 +0000 (22:46 -0700)]
[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option
The generic estimator is always built in anways and all the config options
does is prevent including a minimal amount of code for setting it up.
Additionally the option is already automatically selected for most cases.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 3 Jul 2007 05:41:59 +0000 (22:41 -0700)]
[PKTGEN]: IPSEC support
Added transport mode ESP support for starters. I will send more of
these modes and types once i have resolved the tunnel mode isses.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 3 Jul 2007 05:41:14 +0000 (22:41 -0700)]
[XFRM] Introduce standalone SAD lookup
This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 3 Jul 2007 05:40:36 +0000 (22:40 -0700)]
[PKTGEN]: Introduce sequential flows
By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. Robert defined randomness to be the
default behavior.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 3 Jul 2007 05:39:50 +0000 (22:39 -0700)]
[PKTGEN]: Centralize packet overhead tracking
Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Larry Finger [Tue, 3 Jul 2007 05:36:38 +0000 (22:36 -0700)]
[MAC80211]: Set low initial rate in rc80211_simple
The initial rate for STA's using rc80211_simple is set to the last
rate in the rate table. For situations for which the signal is weak,
the rate may be too high for authentication and association. Although
the rc80211_simple module will adjust the speed, the response may not
be fast enough for a successful connection. This modification sets the
initial rate to the lowest supported value.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 3 Jul 2007 05:07:22 +0000 (22:07 -0700)]
[TCP]: SACK fastpath did override adjusted fackets_out
Do same adjustment to SACK fastpath counters provided that
they're valid.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 30 Jun 2007 20:35:52 +0000 (13:35 -0700)]
[NET]: Fix secondary unicast/multicast address count maintenance
When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter P Waskiewicz Jr [Fri, 29 Jun 2007 04:04:31 +0000 (21:04 -0700)]
[SCHED]: Qdisc changes and sch_rr added for multiqueue
Add the new sch_rr qdisc for multiqueue network device support. Allow
sch_prio and sch_rr to be compiled with or without multiqueue hardware
support.
sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS. This
was done since sch_prio and sch_rr only differ in their dequeue
routine.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter P Waskiewicz Jr [Fri, 6 Jul 2007 20:36:20 +0000 (13:36 -0700)]
[CORE] Stack changes to add multiqueue hardware support API
Add the multiqueue hardware device support API to the core network
stack. Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.
Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter P Waskiewicz Jr [Fri, 29 Jun 2007 03:45:47 +0000 (20:45 -0700)]
[NET]: [DOC] Multiqueue hardware support documentation
Add a brief howto to Documentation/networking for multiqueue. It
explains how to use the multiqueue API in a driver to support
multiqueue paths from the stack, as well as the qdiscs to use for
feeding a multiqueue device.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 28 Jun 2007 20:44:37 +0000 (13:44 -0700)]
[NET]: Fix TX checksum feature check
This patch fixes a boolean error in the new TX checksum check
that causes bogus TSO packets to be generated.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 27 Jun 2007 22:53:49 +0000 (15:53 -0700)]
[L2TP]: Add PPPoL2TP in-kernel documentation
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 27 Jun 2007 22:53:17 +0000 (15:53 -0700)]
[L2TP]: Add PPPoL2TP maintainer
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 Jun 2007 22:52:25 +0000 (15:52 -0700)]
[PPPOL2TP]: Use proper printf format specifier for size_t.
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 27 Jun 2007 22:49:24 +0000 (15:49 -0700)]
[L2TP]: PPP over L2TP driver core
This driver handles only L2TP data frames; control frames are handled
by a userspace application. It implements L2TP using the PPPoX socket
family. There is a PPPoX socket for each L2TP session in an L2TP
tunnel. PPP data within each session is passed through the kernel's
PPP subsystem via this driver. Kernel parameters of each socket can be
read or modified using ioctl() or [gs]etsockopt() calls.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 27 Jun 2007 22:43:43 +0000 (15:43 -0700)]
[L2TP]: Changes to existing ppp and socket kernel headers for L2TP
Add struct sockaddr_pppol2tp to carry L2TP-specific address
information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
use the union inside struct sockaddr_pppox because the L2TP-specific
data is larger than the current size of the union and we must preserve
the size of struct sockaddr_pppox for binary compatibility.
Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
L2TP counters and state from the kernel.
Add new if_pppol2tp.h header.
[ Modified to use aligned_u64 in statistics structure -DaveM ]
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 27 Jun 2007 22:37:46 +0000 (15:37 -0700)]
[UDP]: Introduce UDP encapsulation type for L2TP
This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.
Include padding to put the new encap_rcv field on a 4-byte boundary.
Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Jun 2007 08:28:10 +0000 (01:28 -0700)]
[NET]: dev: secondary unicast address support
Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Jun 2007 08:26:58 +0000 (01:26 -0700)]
[NET]: dev_mcast: switch to generic net_device address lists
Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Jun 2007 08:26:19 +0000 (01:26 -0700)]
[NET]: dev: introduce generic net_device address lists
Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 27 Jun 2007 08:25:11 +0000 (01:25 -0700)]
[NET]: dev_mcast: unexport dev_mc_upload
dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 27 Jun 2007 07:47:37 +0000 (00:47 -0700)]
[NET]: IPV6 checksum offloading in network devices
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.
This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computation
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahide NAKAMURA [Wed, 27 Jun 2007 06:57:49 +0000 (23:57 -0700)]
[XFRM]: Add module alias for transformation type.
It is clean-up for XFRM type modules and adds aliases with its
protocol:
ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
ROUTING and DSTOPTS for MIPv6
It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahide NAKAMURA [Wed, 27 Jun 2007 06:56:32 +0000 (23:56 -0700)]
[IPV6] MIP6: Loadable module support for MIPv6.
This patch makes MIPv6 loadable module named "mip6".
Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:
alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6
Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.
Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO
These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahide NAKAMURA [Wed, 27 Jun 2007 06:51:41 +0000 (23:51 -0700)]
[IPV6] MIP6: Kill unnecessary ifdefs.
Kill unnecessary CONFIG_IPV6_MIP6.
o It is redundant for RAW socket to keep MH out with the config then
it can handle any protocol.
o Clean-up at AH.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 26 Jun 2007 10:23:44 +0000 (03:23 -0700)]
[RTNETLINK]: Fix rtnetlink compat attribute patch
Sent the wrong patch previously.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 25 Jun 2007 21:30:16 +0000 (14:30 -0700)]
[RTNETLINK]: Add nested compat attribute
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.
The attribute looks like this:
struct {
[ compat contents ]
struct rtattr {
.rta_len = total size,
.rta_type = type,
} rta;
struct old_structure struct;
[ nested top-level attribute ]
struct rtattr {
.rta_len = nest size,
.rta_type = type,
} nest_attr;
[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len = private attribute len,
.rta_type = private attribute typ,
} nested_attr;
struct nested_data data;
};
Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 25 Jun 2007 20:49:35 +0000 (13:49 -0700)]
[NETLINK]: attr: add nested compat attribute type
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 25 Jun 2007 11:35:20 +0000 (04:35 -0700)]
[SKBUFF]: Keep track of writable header len of headerless clones
Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.
The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.
I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.
Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:
- sendfile eth0, no NAT: sys 0m0.388s
- sendfile eth0, NAT: sys 0m1.835s
- sendfile eth0: NAT + path: sys 0m0.370s (~ -80%)
- sendfile lo, no NAT: sys 0m0.258s
- sendfile lo, NAT: sys 0m2.609s
- sendfile lo, NAT + patch: sys 0m0.260s (~ -90%)
- sendmsg eth0, no NAT: sys 0m2.508s
- sendmsg eth0, NAT: sys 0m2.539s
- sendmsg eth0, NAT + patch: sys 0m2.445s (no change)
- sendmsg lo, no NAT: sys 0m2.151s
- sendmsg lo, NAT: sys 0m3.557s
- sendmsg lo, NAT + patch: sys 0m2.159s (~ -40%)
I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>