GitHub/MotorolaMobilityLLC/kernel-slsi.git
16 years ago[DCCP]: Ignore feature negotiation on Data packets
Gerrit Renker [Thu, 13 Dec 2007 14:48:19 +0000 (12:48 -0200)]
[DCCP]: Ignore feature negotiation on Data packets

This implements [RFC 4340, p. 32]: "any feature negotiation options received
on DCCP-Data packets MUST be ignored".

Also added a FIXME for further processing, since the code currently (wrongly)
classifies empty Confirm options as invalid - this needs to be resolved in
a separate patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Make code assumptions explicit
Gerrit Renker [Thu, 13 Dec 2007 14:41:46 +0000 (12:41 -0200)]
[DCCP]: Make code assumptions explicit

This removes several `XXX' references which indicate a missing support
for non-1-byte feature values: this is unnecessary, as all currently known
(standardised) SP feature values are 1-byte quantities.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Remove unused and redundant validation functions
Gerrit Renker [Thu, 13 Dec 2007 14:40:40 +0000 (12:40 -0200)]
[DCCP]: Remove unused and redundant validation functions

This removes two inlines which were both called in a single function only:

 1) dccp_feat_change() is always called with either DCCPO_CHANGE_L or DCCPO_CHANGE_R as argument
    * from dccp_set_socktopt_change() via do_dccp_setsockopt() with DCCP_SOCKOPT_CHANGE_R/L
    * from __dccp_feat_init() via dccp_feat_init() also with DCCP_SOCKOPT_CHANGE_R/L.

    Hence the dccp_feat_is_valid_type() is completely unnecessary and always returns true.

 2) Due to (1), the length test reduces to 'len >= 4', which in turn makes
    dccp_feat_is_valid_length() unnecessary.

Furthermore, the inline function dccp_feat_is_reserved() was unfolded,
since only called in a single place.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Support inserting options during the 3-way handshake
Gerrit Renker [Thu, 13 Dec 2007 14:38:11 +0000 (12:38 -0200)]
[DCCP]: Support inserting options during the 3-way handshake

This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Handle timestamps on Request/Response exchange separately
Gerrit Renker [Thu, 13 Dec 2007 14:37:19 +0000 (12:37 -0200)]
[DCCP]: Handle timestamps on Request/Response exchange separately

In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
* timestamps are recorded on the listening socket;
* Responses are sent from dccp_request_sockets;
* suppose two connections reach the listening socket with very small time in between:
* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
 * those which are received by the server during the initial handshake (on dccp_request_sock);
 * those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
 * when a timestamp is present on the initial Request, it is placed into dreq, due to the
   call to dccp_parse_options in dccp_v{4,6}_conn_request;
 * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
   from the request_sock into the child cocket in dccp_create_openreq_child;
 * timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Add (missing) option parsing to request_sock processing
Gerrit Renker [Thu, 13 Dec 2007 14:31:26 +0000 (12:31 -0200)]
[DCCP]: Add (missing) option parsing to request_sock processing

This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
 (i)  resolves a FIXME (removed);
 (ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

 Parsing options happens only after testing whether the received packet is
 a retransmitted Request.  Otherwise, if the Request contained (a possibly
 large number of) feature-negotiation options, recomputing state would have to
 happen each time a retransmitted Request arrives, which opens the door to an
 easy DoS attack.  Since in a genuine retransmission the options should not be
 different from the original, reusing the already computed state seems better.

 The other point is - if there are timestamp options on the Request, they will
 not be answered; which means that in the presence of retransmission (likely
 due to loss and/or other problems), the use of Request/Response RTT sampling
 is suspended, so that startup problems here do not propagate.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Allow to parse options on Request Sockets
Gerrit Renker [Thu, 13 Dec 2007 14:29:24 +0000 (12:29 -0200)]
[DCCP]: Allow to parse options on Request Sockets

The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Collapse repeated `len' statements into one
Gerrit Renker [Thu, 13 Dec 2007 14:27:14 +0000 (12:27 -0200)]
[DCCP]: Collapse repeated `len' statements into one

This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Support for server holding timewait state
Gerrit Renker [Thu, 13 Dec 2007 14:25:01 +0000 (12:25 -0200)]
[DCCP]: Support for server holding timewait state

This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Use maximum-RTO backoff from DCCP spec
Gerrit Renker [Thu, 13 Dec 2007 14:16:23 +0000 (12:16 -0200)]
[DCCP]: Use maximum-RTO backoff from DCCP spec

This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Shift the retransmit timer for active-close into output.c
Gerrit Renker [Thu, 13 Dec 2007 14:02:43 +0000 (12:02 -0200)]
[DCCP]: Shift the retransmit timer for active-close into output.c

When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: fix section mismatch warnings
Daniel Lezcano [Thu, 13 Dec 2007 13:34:58 +0000 (05:34 -0800)]
[IPV6]: fix section mismatch warnings

Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DCCP]: Perform SHUT_RD and SHUT_WR on receiving close
Gerrit Renker [Thu, 13 Dec 2007 13:28:43 +0000 (11:28 -0200)]
[DCCP]: Perform SHUT_RD and SHUT_WR on receiving close

This patch performs two changes:

1) Close the write-end in addition to the read-end when a fin-like segment
  (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
  in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
  that when a fin-like segment has been sent there is no guarantee at all that
  any   further data will be processed.
  Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
  segment is encountered.

2) Minor change: I noted that code appears twice in different places and think it
   makes sense to put this into a self-contained function (dccp_enqueue()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[DECNET]: Fix inverted wait flag in xfrm_lookup call
Herbert Xu [Thu, 13 Dec 2007 13:24:40 +0000 (05:24 -0800)]
[DECNET]: Fix inverted wait flag in xfrm_lookup call

My previous patch made the wait flag take the opposite value to what
it should be.  This patch fixes that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Check RTNL status in unregister_netdevice
Herbert Xu [Thu, 13 Dec 2007 03:21:56 +0000 (19:21 -0800)]
[NET]: Check RTNL status in unregister_netdevice

The caller must hold the RTNL so let's check it in unregister_netdevice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Do not let packets pass when ICMP flag is off
Herbert Xu [Thu, 13 Dec 2007 02:54:16 +0000 (18:54 -0800)]
[IPSEC]: Do not let packets pass when ICMP flag is off

This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT
Herbert Xu [Thu, 13 Dec 2007 02:48:58 +0000 (18:48 -0800)]
[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT

This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Fix reversed ICMP6 policy check
Herbert Xu [Thu, 13 Dec 2007 02:47:48 +0000 (18:47 -0800)]
[IPSEC]: Fix reversed ICMP6 policy check

The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Fix compiler warning.
Michael Chan [Fri, 21 Dec 2007 23:04:49 +0000 (15:04 -0800)]
[BNX2]: Fix compiler warning.

Change bnx2_init_napi() to void.

Warning was noted by DaveM.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Update version to 1.7.1.
Michael Chan [Fri, 21 Dec 2007 04:02:14 +0000 (20:02 -0800)]
[BNX2]: Update version to 1.7.1.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Enable new tx ring.
Michael Chan [Fri, 21 Dec 2007 04:01:44 +0000 (20:01 -0800)]
[BNX2]: Enable new tx ring.

Enable new tx ring and add new MSIX handler and NAPI poll function
for the new tx ring.  Enable MSIX when the hardware supports it.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Add support for a new tx ring.
Michael Chan [Fri, 21 Dec 2007 04:01:19 +0000 (20:01 -0800)]
[BNX2]: Add support for a new tx ring.

To separate TX IRQs into a different MSIX vector, we need to
support a new tx ring.  The original tx ring will still be used
when not using MSIX.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Support multiple MSIX IRQs.
Michael Chan [Fri, 21 Dec 2007 03:59:30 +0000 (19:59 -0800)]
[BNX2]: Support multiple MSIX IRQs.

Change bnx2_napi struct into an array and add code to manage multiple
IRQs.  MSIX hardware structures and new registers are also added.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Move rx indexes into bnx2_napi struct.
Michael Chan [Fri, 21 Dec 2007 03:57:19 +0000 (19:57 -0800)]
[BNX2]: Move rx indexes into bnx2_napi struct.

Rx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Move tx indexes into bnx2_napi struct.
Michael Chan [Fri, 21 Dec 2007 03:56:59 +0000 (19:56 -0800)]
[BNX2]: Move tx indexes into bnx2_napi struct.

Tx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Introduce new bnx2_napi structure.
Michael Chan [Fri, 21 Dec 2007 03:56:37 +0000 (19:56 -0800)]
[BNX2]: Introduce new bnx2_napi structure.

Introduce a bnx2_napi structure that will hold a napi_struct and
other fields to handle NAPI polling for the napi_struct.  Various tx
and rx indexes and status block pointers will be moved from the main
bnx2 structure to this bnx2_napi structure.

Most NAPI path functions are modified to be passed this bnx2_napi
struct pointer.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Restructure IRQ datastructures.
Michael Chan [Fri, 21 Dec 2007 03:56:09 +0000 (19:56 -0800)]
[BNX2]: Restructure IRQ datastructures.

Add a table to keep track of multiple IRQs and restructure the IRQ
request and free functions so that they can be easily expanded to
handle multiple IRQs.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Add function to fetch hardware tx index.
Michael Chan [Fri, 21 Dec 2007 03:55:39 +0000 (19:55 -0800)]
[BNX2]: Add function to fetch hardware tx index.

This makes the code cleaner and easier to support different tx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Update version to 1.6.9.
Michael Chan [Wed, 12 Dec 2007 19:20:22 +0000 (11:20 -0800)]
[BNX2]: Update version to 1.6.9.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Enable S/G for jumbo RX.
Michael Chan [Wed, 12 Dec 2007 19:19:57 +0000 (11:19 -0800)]
[BNX2]: Enable S/G for jumbo RX.

If the MTU requires more than 1 page for the SKB, enable the page ring
and calculate the size of the page ring.  This will guarantee order-0
allocation regardless of the MTU size.

Fixup loopback test packet size so that we don't deal with the pages
during loopback test.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Add fast path code to handle RX pages.
Michael Chan [Wed, 12 Dec 2007 19:19:35 +0000 (11:19 -0800)]
[BNX2]: Add fast path code to handle RX pages.

Add function to reuse a page in case of allocation or other errors.
Add code to construct the completed SKB with the additional data in
the pages.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Add init. code to handle RX pages.
Michael Chan [Wed, 12 Dec 2007 19:19:12 +0000 (11:19 -0800)]
[BNX2]: Add init. code to handle RX pages.

Add new fields to keep track of the pages and the page rings.
Add functions to allocate and free pages.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Update firmware to support S/G RX buffers.
Michael Chan [Wed, 12 Dec 2007 19:18:34 +0000 (11:18 -0800)]
[BNX2]: Update firmware to support S/G RX buffers.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Restructure RX ring init. code.
Michael Chan [Wed, 12 Dec 2007 19:17:43 +0000 (11:17 -0800)]
[BNX2]: Restructure RX ring init. code.

Factor out the common functions that will be used to initialize the
normal RX rings and the page rings.

Change the copybreak constant RX_COPY_THRESH to 128.  This same
constant will be used for the max. size of the linear SKB when pages
are used.  Copybreak will be turned off when pages are used.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Restructure RX fast path handling.
Michael Chan [Wed, 12 Dec 2007 19:17:01 +0000 (11:17 -0800)]
[BNX2]: Restructure RX fast path handling.

Add a new function to handle new SKB allocation and to prepare the
completed SKB.  This makes it easier to add support for non-linear
SKB.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[BNX2]: Add ring constants.
Michael Chan [Wed, 12 Dec 2007 19:16:19 +0000 (11:16 -0800)]
[BNX2]: Add ring constants.

Define the various ring constants to make the code cleaner.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: fix drivers/net/ns83820.c build
Andrew Morton [Wed, 12 Dec 2007 23:07:11 +0000 (15:07 -0800)]
[NET]: fix drivers/net/ns83820.c build

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPIP]: Allow rebinding the tunnel to another interface
Michal Schmidt [Wed, 12 Dec 2007 19:01:43 +0000 (11:01 -0800)]
[IPIP]: Allow rebinding the tunnel to another interface

Once created, an IP tunnel can't be bound to another device.
(reported as https://bugzilla.redhat.com/show_bug.cgi?id=419671)

To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode ipip remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ip/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

If the change is acceptable, I'll do the same for GRE and SIT tunnels.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Remove unused define from loopback driver.
Pavel Emelyanov [Wed, 12 Dec 2007 19:00:04 +0000 (11:00 -0800)]
[NET]: Remove unused define from loopback driver.

The LOOPBACK_OVERHEAD is not used in this file at all.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETNS]: network namespace was passed into dev_getbyhwaddr but not used
Denis V. Lunev [Wed, 12 Dec 2007 18:47:38 +0000 (10:47 -0800)]
[NETNS]: network namespace was passed into dev_getbyhwaddr but not used

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Remove FASTCALL macro
Harvey Harrison [Wed, 12 Dec 2007 18:46:51 +0000 (10:46 -0800)]
[NET]: Remove FASTCALL macro

X86_32 was the last user of the FASTCALL macro, now that it
uses regparm(3) by default, this macro expands to nothing.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Add ICMP host relookup support
Herbert Xu [Wed, 12 Dec 2007 18:44:43 +0000 (10:44 -0800)]
[IPSEC]: Add ICMP host relookup support

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
Herbert Xu [Wed, 12 Dec 2007 18:44:16 +0000 (10:44 -0800)]
[IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Make xfrm_lookup flags argument a bit-field
Herbert Xu [Wed, 12 Dec 2007 18:36:59 +0000 (10:36 -0800)]
[IPSEC]: Make xfrm_lookup flags argument a bit-field

This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Remove previous loss intervals implementation
Gerrit Renker [Wed, 12 Dec 2007 16:23:08 +0000 (14:23 -0200)]
[TFRC]: Remove previous loss intervals implementation

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[CCID3]: Interface CCID3 code with newer Loss Intervals Database
Gerrit Renker [Wed, 12 Dec 2007 16:06:14 +0000 (14:06 -0200)]
[CCID3]: Interface CCID3 code with newer Loss Intervals Database

This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: CCID3 (and CCID4) needs to access these inlines
Gerrit Renker [Wed, 12 Dec 2007 16:03:01 +0000 (14:03 -0200)]
[TFRC]: CCID3 (and CCID4) needs to access these inlines

This moves two inlines back to packet_history.h: these are not private
to packet_history.c, but are needed by CCID3/4 to detect whether a new
loss is indicated, or whether a loss is already pending.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[CCID3]: Redundant debugging output / documentation
Gerrit Renker [Wed, 12 Dec 2007 15:57:14 +0000 (13:57 -0200)]
[CCID3]: Redundant debugging output / documentation

Each time feedback is sent two lines are printed:

ccid3_hc_rx_send_feedback: client ... - entry
ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=...

The first line is redundant and thus removed.

Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Ringbuffer to track loss interval history
Gerrit Renker [Wed, 12 Dec 2007 15:50:51 +0000 (13:50 -0200)]
[TFRC]: Ringbuffer to track loss interval history

A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

The `swap' routine to keep the RX history sorted is due to and was written
by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
  - support for all cases of loss, including re-ordered/duplicate packets;
  - waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Loss interval code needs the macros/inlines that were moved
Gerrit Renker [Wed, 12 Dec 2007 14:28:40 +0000 (12:28 -0200)]
[TFRC]: Loss interval code needs the macros/inlines that were moved

This moves the inlines (which were previously declared as macros) back into
packet_history.h since the loss detection code needs to be able to read entries
from the RX history in order to create the relevant loss entries: it needs at
least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn
require the definition of the other inlines (macros).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Put RX/TX initialisation into tfrc.c
Gerrit Renker [Wed, 12 Dec 2007 14:24:49 +0000 (12:24 -0200)]
[TFRC]: Put RX/TX initialisation into tfrc.c

This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> {rx,tx}_exit()

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETNS]: separate af_packet netns data
Denis V. Lunev [Tue, 11 Dec 2007 12:19:54 +0000 (04:19 -0800)]
[NETNS]: separate af_packet netns data

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETNS]: struct net content re-work (v3)
Denis V. Lunev [Tue, 11 Dec 2007 12:19:17 +0000 (04:19 -0800)]
[NETNS]: struct net content re-work (v3)

Recently David Miller and Herbert Xu pointed out that struct net becomes
overbloated and un-maintainable. There are two solutions:
- provide a pointer to a network subsystem definition from struct net.
  This costs an additional dereferrence
- place sub-system definition into the structure itself. This will speedup
  run-time access at the cost of recompilation time

The second approach looks better for us. Other sub-systems will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen.
Denis V. Lunev [Tue, 11 Dec 2007 12:18:41 +0000 (04:18 -0800)]
[AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: make the protocol initialization to return an error code
Daniel Lezcano [Tue, 11 Dec 2007 10:25:35 +0000 (02:25 -0800)]
[IPV6]: make the protocol initialization to return an error code

This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: make inet6_register_protosw to return an error code
Daniel Lezcano [Tue, 11 Dec 2007 10:25:01 +0000 (02:25 -0800)]
[IPV6]: make inet6_register_protosw to return an error code

This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: make frag to return an error at initialization
Daniel Lezcano [Tue, 11 Dec 2007 10:24:29 +0000 (02:24 -0800)]
[IPV6]: make frag to return an error at initialization

This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: make extended headers to return an error at initialization
Daniel Lezcano [Tue, 11 Dec 2007 10:23:54 +0000 (02:23 -0800)]
[IPV6]: make extended headers to return an error at initialization

This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: make flowlabel to return an error
Daniel Lezcano [Tue, 11 Dec 2007 10:23:18 +0000 (02:23 -0800)]
[IPV6]: make flowlabel to return an error

This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: Cleanup sysctl manipulations in devinet.c
Pavel Emelyanov [Tue, 11 Dec 2007 10:17:40 +0000 (02:17 -0800)]
[IPV4]: Cleanup sysctl manipulations in devinet.c

This includes:

 * moving neigh_sysctl_(un)register calls inside
   devinet_sysctl_(un)register ones, as they are always
   called in pairs;
 * making __devinet_sysctl_unregister() to unregister
   the ipv4_devconf struct, while original devinet_sysctl_unregister()
   works with the in_device to handle both - devconf and
   neigh sysctls;
 * make stubs for CONFIG_SYSCTL=n case to get rid of
   in-code ifdefs.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: Cleanup IN_DEV_MFORWARD macro
Pavel Emelyanov [Tue, 11 Dec 2007 10:16:47 +0000 (02:16 -0800)]
[IPV4]: Cleanup IN_DEV_MFORWARD macro

This is essentially IN_DEV_ANDCONF with proper arguments.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[INET]: Use BUILD_BUG_ON in inet_timewait_sock.c checks
Pavel Emelyanov [Tue, 11 Dec 2007 10:12:36 +0000 (02:12 -0800)]
[INET]: Use BUILD_BUG_ON in inet_timewait_sock.c checks

Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr->thread_slots)
check nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TCP]: Use BUILD_BUG_ON for tcp_skb_cb size checking
Pavel Emelyanov [Tue, 11 Dec 2007 10:12:04 +0000 (02:12 -0800)]
[TCP]: Use BUILD_BUG_ON for tcp_skb_cb size checking

The sizeof(struct tcp_skb_cb) should not be less than the
sizeof(skb->cb). This is checked in net/ipv4/tcp.c, but
this check can be made more gracefully.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETLINK]: kzalloc() conversion
Eric Dumazet [Tue, 11 Dec 2007 10:09:47 +0000 (02:09 -0800)]
[NETLINK]: kzalloc() conversion

nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc().
It is now returning zeroed memory to its callers.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: dst_ifdown() cleanup
Eric Dumazet [Tue, 11 Dec 2007 10:00:30 +0000 (02:00 -0800)]
[NET]: dst_ifdown() cleanup

This cleanup shrinks size of net/core/dst.o on i386 from 1299 to 1289 bytes.
(This is because dev_hold()/dev_put() are doing atomic_inc()/atomic_dec() and
force compiler to re-evaluate memory contents.)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPSEC]: Add xfrm_input_state helper
Herbert Xu [Tue, 11 Dec 2007 09:53:43 +0000 (01:53 -0800)]
[IPSEC]: Add xfrm_input_state helper

This patch adds the xfrm_input_state helper function which returns the
current xfrm state being processed on the input path given an sk_buff.
This is currently only used by xfrm_input but will be used by ESP upon
asynchronous resumption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it
Gerrit Renker [Sat, 8 Dec 2007 18:26:59 +0000 (16:26 -0200)]
[CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore
Gerrit Renker [Sat, 8 Dec 2007 18:08:41 +0000 (16:08 -0200)]
[TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Move comment.
Gerrit Renker [Sat, 8 Dec 2007 17:08:08 +0000 (15:08 -0200)]
[TFRC]: Move comment.

Moved up the comment "Receiver routines" above the first occurrence of
RX history routines.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Remove unused "mibalign" argument for snmp_mib_init().
YOSHIFUJI Hideaki [Thu, 24 Jan 2008 06:31:45 +0000 (22:31 -0800)]
[NET]: Remove unused "mibalign" argument for snmp_mib_init().

With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: last default route is a fib table property
Denis V. Lunev [Sat, 8 Dec 2007 08:32:23 +0000 (00:32 -0800)]
[IPV4]: last default route is a fib table property

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: Unify assignment of fi to fib_result
Denis V. Lunev [Sat, 8 Dec 2007 08:31:44 +0000 (00:31 -0800)]
[IPV4]: Unify assignment of fi to fib_result

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: no need pass pointer to a default into fib_detect_death
Denis V. Lunev [Sat, 8 Dec 2007 08:22:13 +0000 (00:22 -0800)]
[IPV4]: no need pass pointer to a default into fib_detect_death

ipv4: no need pass pointer to a default into fib_detect_death

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: route6 remove ifdef for fib_rules
Daniel Lezcano [Sat, 8 Dec 2007 08:14:54 +0000 (00:14 -0800)]
[IPV6]: route6 remove ifdef for fib_rules

The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: remove ifdef in route6 for xfrm6
Daniel Lezcano [Sat, 8 Dec 2007 08:14:11 +0000 (00:14 -0800)]
[IPV6]: remove ifdef in route6 for xfrm6

The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: create route6 proc init-fini functions
Daniel Lezcano [Sat, 8 Dec 2007 08:13:32 +0000 (00:13 -0800)]
[IPV6]: create route6 proc init-fini functions

Make the proc creation/destruction to be a separate function. That
allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function
and make them more readable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET] sysctl: make sysctl_somaxconn per-namespace
Pavel Emelyanov [Sat, 8 Dec 2007 08:12:33 +0000 (00:12 -0800)]
[NET] sysctl: make sysctl_somaxconn per-namespace

Just move the variable on the struct net and adjust
its usage.

Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.

Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET] sysctl: prepare core tables to point to netns variables
Pavel Emelyanov [Sat, 8 Dec 2007 08:11:51 +0000 (00:11 -0800)]
[NET] sysctl: prepare core tables to point to netns variables

Some of ctl variables are going to be on the struct
net. Here's the way to adjust the ->data pointer on the
ctl_table-s to point on the right variable.

Since some pointers still point on the global variables,
I keep turning the write bits off on such tables.

This looks to become a common procedure for net sysctls,
so later parts of this code may migrate to some more
generic place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET] sysctl: make the sys.net.core sysctls per-namespace
Pavel Emelyanov [Sat, 8 Dec 2007 08:09:24 +0000 (00:09 -0800)]
[NET] sysctl: make the sys.net.core sysctls per-namespace

Making them per-namespace is required for the following
two reasons:

 First, some ctl values have a per-namespace meaning.
 Second, making them writable from the sub-namespace
 is an isolation hole.

So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[SNMP]: Remove unused devconf macros.
Pavel Emelyanov [Sat, 8 Dec 2007 07:56:57 +0000 (23:56 -0800)]
[SNMP]: Remove unused devconf macros.

The SNMP_INC_STATS_OFFSET_BH is used only by ICMP6_INC_STATS_OFFSET_BH.
The ICMP6_INC_STATS_OFFSET_BH is unused.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IUCV]: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:51:45 +0000 (00:51 -0800)]
[IUCV]: use LIST_HEAD instead of LIST_HEAD_INIT

these three list_head are all local variables, but can also use
LIST_HEAD.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[XFRM] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:51:11 +0000 (00:51 -0800)]
[XFRM] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[X25]: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:50:43 +0000 (00:50 -0800)]
[X25]: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:50:15 +0000 (00:50 -0800)]
[LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:49:47 +0000 (00:49 -0800)]
[IPV4] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
Denis Cheng [Fri, 7 Dec 2007 08:49:17 +0000 (00:49 -0800)]
[NET] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: Move trie_local and trie_main into the proc iterator.
Eric W. Biederman [Fri, 7 Dec 2007 08:47:47 +0000 (00:47 -0800)]
[IPV4]: Move trie_local and trie_main into the proc iterator.

We only use these variables when displaying the trie in proc so
place them into the iterator to make this explicit.  We should
probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES
case but at least this makes it clear that the silliness is limited
to the display in /proc.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines.
Eric W. Biederman [Fri, 7 Dec 2007 08:46:11 +0000 (00:46 -0800)]
[IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines.

There are only 2 users and it doesn't hurt to call fib_get_table
instead, and it makes it easier to make the fib network namespace
aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6] route6/fib6: Don't panic a kmem_cache_create.
Daniel Lezcano [Fri, 7 Dec 2007 08:45:16 +0000 (00:45 -0800)]
[IPV6] route6/fib6: Don't panic a kmem_cache_create.

If the kmem_cache_creation fails, the kernel will panic. It is
acceptable if the system is booting, but if the ipv6 protocol is
compiled as a module and it is loaded after the system has booted, do
we want to panic instead of just failing to initialize the protocol ?

The init function is now returning an error and this one is checked
for protocol initialization. So the ipv6 protocol will safely fails.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Make af_inet6 to check ip6_route_init return value.
Daniel Lezcano [Fri, 7 Dec 2007 08:44:29 +0000 (00:44 -0800)]
[IPV6]: Make af_inet6 to check ip6_route_init return value.

The af_inet6 initialization function does not check the return code of
the route initilization, so if something goes wrong, the protocol
initialization will continue anyway.  This patch takes into account
the modification made in the different route's initialization
subroutines to check the return value and to make the protocol
initialization to fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Make ip6_route_init to return an error code.
Daniel Lezcano [Fri, 7 Dec 2007 08:43:48 +0000 (00:43 -0800)]
[IPV6]: Make ip6_route_init to return an error code.

The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Make fib6_rules_init to return an error code.
Daniel Lezcano [Fri, 7 Dec 2007 08:42:52 +0000 (00:42 -0800)]
[IPV6]: Make fib6_rules_init to return an error code.

When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Make xfrm6_init to return an error code.
Daniel Lezcano [Fri, 7 Dec 2007 08:42:11 +0000 (00:42 -0800)]
[IPV6]: Make xfrm6_init to return an error code.

The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Make fib6_init to return an error code.
Daniel Lezcano [Fri, 7 Dec 2007 08:40:34 +0000 (00:40 -0800)]
[IPV6]: Make fib6_init to return an error code.

If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Multiple namespaces in the all dst_ifdown routines.
Denis V. Lunev [Fri, 7 Dec 2007 08:38:10 +0000 (00:38 -0800)]
[NET]: Multiple namespaces in the all dst_ifdown routines.

Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: New rx history code
Arnaldo Carvalho de Melo [Thu, 6 Dec 2007 15:18:11 +0000 (13:18 -0200)]
[TFRC]: New rx history code

Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.

Original changeset comment from Gerrit:
      -----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[CCID3]: The receiver of a half-connection does not set window counter values
Gerrit Renker [Thu, 6 Dec 2007 14:29:07 +0000 (12:29 -0200)]
[CCID3]: The receiver of a half-connection does not set window counter values

Only the sender sets window counters [RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Rename dccp_rx_ to tfrc_rx_
Arnaldo Carvalho de Melo [Thu, 6 Dec 2007 14:28:39 +0000 (12:28 -0200)]
[TFRC]: Rename dccp_rx_ to tfrc_rx_

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Make the rx history slab be global
Arnaldo Carvalho de Melo [Thu, 6 Dec 2007 14:28:13 +0000 (12:28 -0200)]
[TFRC]: Make the rx history slab be global

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
Arnaldo Carvalho de Melo [Thu, 6 Dec 2007 14:27:49 +0000 (12:27 -0200)]
[TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>