Gerrit Renker [Tue, 14 Nov 2006 13:21:36 +0000 (11:21 -0200)]
[TCP/DCCP]: Introduce net_xmit_eval
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.
This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Peter Zijlstra [Tue, 14 Nov 2006 00:19:07 +0000 (16:19 -0800)]
[SCTP]: Cleanup of the sctp state table code.
I noticed an insane high density of repeated characters fixable by a
simple regular expression:
% s/{.fn = \([^,]*\),[[:space:]]\+\(\\\n[[:space:]]\+\)\?.name = "\1"}/TYPE_SCTP_FUNC(\1)/g
(NOTE: the .name for .fn = sctp_sf_do_9_2_start_shutdown didn't match)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Tue, 14 Nov 2006 00:12:08 +0000 (16:12 -0800)]
[ATM] ambassador,firestream: "-1 >>" is implementation defined
6.5.7(5): The result of E1 >> E2 is E1 right-shifted E2 bit positions.
...
If E1 has a signed type and a negative value, the resulting value
is implementation defined.
So, cast -1 to unsigned type to make result well-defined.
[ Modified to use ~0U based upon recommendation from Al Viro. -DaveM ]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Tue, 14 Nov 2006 00:09:01 +0000 (16:09 -0800)]
[SELinux]: Add support for DCCP
This patch implements SELinux kernel support for DCCP
(http://linux-net.osdl.org/index.php/DCCP), which is similar in
operation to TCP in terms of connected state between peers.
The SELinux support for DCCP is thus modeled on existing handling of
TCP.
A new DCCP socket class is introduced, to allow protocol
differentation. The permissions for this class inherit all of the
socket permissions, as well as the current TCP permissions (node_bind,
name_bind etc). IPv4 and IPv6 are supported, although labeled
networking is not, at this stage.
Patches for SELinux userspace are at:
http://people.redhat.com/jmorris/selinux/dccp/user/
I've performed some basic testing, and it seems to be working as
expected. Adding policy support is similar to TCP, the only real
difference being that it's a different protocol.
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Tue, 14 Nov 2006 00:02:22 +0000 (16:02 -0800)]
[NET]: The scheduled removal of the frame diverter.
This patch contains the scheduled removal of the frame diverter.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 13 Nov 2006 15:34:38 +0000 (13:34 -0200)]
[DCCPv6]: Choose a genuine initial sequence number
This
* resolves a FIXME - DCCPv6 connections started all with
an initial sequence number of 1;
* provides a redirection `secure_dccpv6_sequence_number'
in case the init_sequence_v6 code should be updated later;
* concentrates the update of S.GAR into dccp_connect_init();
* removes a duplicate dccp_update_gss() in ipv4.c;
* uses inet->dport instead of usin->sin_port, due to the
following assignment in dccp_v4_connect():
inet->dport = usin->sin_port;
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:31:50 +0000 (13:31 -0200)]
[DCCP]: Remove redundant statements in init_sequence (ISS)
This patch removes the following redundancies:
1) The test skb->protocol == htons(ETH_P_IPV6) in dccp_v6_init_sequence
is always true since
* dccp_v6_conn_request() is the only calling function
* dccp_v6_conn_request() redirects all skb's with ETH_P_IP to
dccp_v4_conn_request()
2) The first argument, `struct sock *sk', of dccp_v{4,6}_init_sequence()
is never used.
(This is similar for tcp_v{4,6}_init_sequence, an analogous patch has been
submitted to netdev and merged.)
By the way - are the `sport' / `dport' arguments in the right order?
I have made them consistent among calls but they seem to be in the
reverse order.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:26:51 +0000 (13:26 -0200)]
[DCCP]: Remove forward declarations in timer.c
This removes 3 forward declarations by reordering 2 functions.
No code change at all.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:25:41 +0000 (13:25 -0200)]
[DCCP]: Introduce a consistent naming scheme for sysctls
In order to make their function clearer and obtain a consistent naming
scheme to identify sysctls, all existing DCCP sysctls have been prefixed
with `sysctl_dccp', following the same convention as used by TCP.
Feature-specific sysctls retain the `feat' in the middle, although the
`default' has been dropped, since it is obvious from use.
Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:23:52 +0000 (13:23 -0200)]
[DCCP]: Add sysctls to control retransmission behaviour
This adds 3 sysctls which govern the retransmission behaviour of DCCP control
packets (3way handshake, feature negotiation).
It removes 4 FIXMEs from the code.
The close resemblance of sysctl variables to their TCP analogues is emphasised
not only by their name, but also by giving them the same initial values.
This is useful since there is not much practical experience with DCCP yet.
Furthermore, with regard to the previous patch, it is now possible to limit
the number of keepalive-Responses by setting net.dccp.default.request_retries
(also a bit like in TCP).
Lastly, added documentation of all existing DCCP sysctls.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:12:07 +0000 (13:12 -0200)]
[DCCP]: Increment sequence numbers on retransmitted Response packets
Problem:
Gerrit Renker [Mon, 13 Nov 2006 15:07:51 +0000 (13:07 -0200)]
[DCCP]: Update comments on precisely which packets can be retransmitted
This updates program documentation: spell out precise conditions about
which packets are eligible for retransmission (which is actually quite
hard to extract from RFC 4340).
It is based on the following table derived from RFC 4340:
+-----------+---------------------------------+---------------------+
| Type | Retransmit? | Remark |
+-----------+---------------------------------+---------------------+
| Request | in client-REQUEST state | sec. 8.1.1 |
| Response | NEVER | SHOULD NOT, 8.1.3 |
| Data | NEVER | unreliable protocol |
| Ack | possible in client-PARTOPEN | sec. 8.1.5 |
| DataAck | NEVER | unreliable protocol |
| CloseReq | only in server-CLOSEREQ state | MUST, sec. 8.3 |
| Close | in node-CLOSING state | MUST, sec. 8.3 |
+-----------+-------------------------------------------------------+
| Reset | only in response to other packets |
| Sync | only in response to sequence-invalid packets (7.5.4) |
| SyncAck | only in response to Sync packets |
+-----------+-------------------------------------------------------+
Hence the only packets eligible for retransmission are:
* Requests in client-REQUEST state (sec. 8.1.1)
* Acks in client-PARTOPEN state (sec. 8.1.5)
* CloseReq in server-CLOSEREQ state (sec. 8.3)
* Close in node-CLOSING state (sec. 8.3)
I had meant to put in a check for these types too, but have left that
for later.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
David S. Miller [Mon, 13 Nov 2006 07:02:01 +0000 (23:02 -0800)]
[DECNET]: Fix build regressions.
Spotted by Arnaldo.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Nov 2006 23:01:14 +0000 (15:01 -0800)]
[TCP] htcp: Better packing of struct htcp.
Based upon a patch by Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Haley [Fri, 10 Nov 2006 22:54:51 +0000 (14:54 -0800)]
[IPv6]: Only modify checksum for UDP
Only change upper-layer checksum from 0 to 0xFFFF for UDP (as RFC 768
states), not for others as RFC 4443 doesn't require it.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 10 Nov 2006 22:11:04 +0000 (14:11 -0800)]
[IPv6] rules: Remove bogus tos validation check
Noticed by Al Viro:
(frh->tos & ~IPV6_FLOWINFO_MASK))
where IPV6_FLOWINFO_MASK is htonl(0xfffffff) and frh->tos
is u8, which makes no sense here...
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 10 Nov 2006 22:10:15 +0000 (14:10 -0800)]
[NETLINK]: Do precise netlink message allocations where possible
Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.
Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 10 Nov 2006 22:06:49 +0000 (14:06 -0800)]
[TCP]: Remove dead code in init_sequence
This removes two redundancies:
1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
is always true, due to
* tcp_v6_conn_request() is the only function calling this one
* tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]
2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
never used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 10 Nov 2006 19:43:06 +0000 (17:43 -0200)]
[DCCP]: Support for partial checksums (RFC 4340, sec. 9.2)
This patch does the following:
a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
b) provides necessary socket options and documentation as to how to use them
c) basic support and infrastructure for the Minimum Checksum Coverage feature
[RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
interface
In addition, it
(1) fixes two bugs in the DCCPv4 checksum computation:
* pseudo-header used checksum_len instead of skb->len
* incorrect checksum coverage calculation based on dccph_x
(2) removes dccp_v4_verify_checksum() since it reduplicates code of the
checksum computation; code calling this function is updated accordingly.
(3) now uses skb_checksum(), which is safer than checksum_partial() if the
sk_buff has is a non-linear buffer (has pages attached to it).
(4) fixes an outstanding TODO item:
* If P.CsCov is too large for the packet size, drop packet and return.
The code has been tested with applications, the latest version of tcpdump now
comes with support for partial DCCP checksums.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
YOSHIFUJI Hideaki [Sat, 4 Nov 2006 11:11:37 +0000 (20:11 +0900)]
[IPV6]: Per-interface statistics support.
For IP MIB (RFC4293).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
YOSHIFUJI Hideaki [Fri, 13 Oct 2006 07:17:25 +0000 (16:17 +0900)]
[IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}.
Otherwise, we will see a lot of casts...
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
YOSHIFUJI Hideaki [Thu, 19 Oct 2006 04:50:09 +0000 (13:50 +0900)]
[IPV6] ROUTE: Use &rt->u.dst instead of cast.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
YOSHIFUJI Hideaki [Fri, 13 Oct 2006 17:00:56 +0000 (02:00 +0900)]
[IPV6] ROUTE: Use macros to format /proc/net/ipv6_route.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Gerrit Renker [Fri, 10 Nov 2006 18:29:14 +0000 (16:29 -0200)]
[DCCP]: Update code comments for Step 2/3
Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340.
All comments have been updated against this document, and the reference to step
2 has been made consistent throughout the files.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 18:08:37 +0000 (16:08 -0200)]
[DCCP]: tidy up dccp_v{4,6}_conn_request
This is a code simplification to remove reduplicated code
by concentrating and abstracting shared code.
Detailed Changes:
Ian McDonald [Fri, 10 Nov 2006 15:09:10 +0000 (13:09 -0200)]
[DCCP]: Fix logfile overflow
This patch fixes data being spewed into the logs continually. As the
code stood if there was a large queue and long delays timeo would go
down to zero and never get reset.
This fixes it by resetting timeo. Put constant into header as well.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Ian McDonald [Fri, 10 Nov 2006 15:04:52 +0000 (13:04 -0200)]
[DCCP]: Fix DCCP Probe Typo
Fixes a typo in Kconfig, patch is by Ian McDonald and is re-sent from
http://www.mail-archive.com/dccp@vger.kernel.org/msg00579.html
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 15:01:31 +0000 (13:01 -0200)]
[DCCPv6]: remove forward declarations in ipv6.c
This does the same for ipv6.c as the preceding one does for ipv4.c: Only the
inet_connection_sock_af_ops forward declarations remain, since at least
dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock.
No code change, merely re-ordering.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 14:52:36 +0000 (12:52 -0200)]
[DCCPv4]: remove forward declarations in ipv4.c
This relates to Arnaldo's announcement in
http://www.mail-archive.com/dccp@vger.kernel.org/msg00604.html
Originally this had been part of the Oops fix and is a revised variant of
http://www.mail-archive.com/dccp@vger.kernel.org/msg00598.html
No code change, merely reshuffling, with the particular objective of
having all request_sock_ops close(r) together for more clarity.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 14:32:01 +0000 (12:32 -0200)]
[DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG
This patch removes two functions, the send_ack functions of request_sock,
which are not called/used by the DCCP code. It is correct that these
functions are not called, below is a justification why calling these
functions (on a passive socket in the LISTEN/RESPOND state) would mean
a DCCP protocol violation.
A) Background: using request_sock in TCP:
Arnaldo Carvalho de Melo [Fri, 10 Nov 2006 14:01:52 +0000 (12:01 -0200)]
[DCCP] timewait: Remove leftover extern declarations
Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME,
but as he suggests in the same patch the best thing is to just ditch this
declaration, while doing that also noticed that tcp_tw_count is as well not
defined anywhere, so ditch it too.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 13:46:34 +0000 (11:46 -0200)]
[DCCP]: Simplify jump labels in dccp_v{4,6}_rcv
This is a code simplification and was singled out from the
DCCPv6 Oops patch on
http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html
It mainly makes the code consistent between ipv{4,6}.c for the functions
dccp_v4_rcv
dccp_v6_rcv
and removes the do_time_wait label to simplify code somewhat.
Commiter note: fixed up a compile problem, trivial.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 13:22:32 +0000 (11:22 -0200)]
[DCCP]: Combine allocating & zeroing header space on skb
This is a code simplification:
it combines three often recurring operations into one inline function,
* allocate `len' bytes header space in skb
* fill these `len' bytes with zeroes
* cast the start of this header space as dccp_hdr
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 13:13:33 +0000 (11:13 -0200)]
[DCCPv6]: Add a FIXME for missing IPV6_PKTOPTIONS
This refers to the possible memory leak pointed out in
http://www.mail-archive.com/dccp@vger.kernel.org/msg00574.html,
fixed by David Miller in
http://www.mail-archive.com/netdev@vger.kernel.org/msg24881.html
and adds a FIXME to point out where code is missing.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 04:13:56 +0000 (02:13 -0200)]
[DCCP]: set safe upper bound for option length
This is a re-send from
http://www.mail-archive.com/dccp@vger.kernel.org/msg00553.html
It is the same patch as before, but I have built in Arnaldo's suggestions
pointed out in that posting.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
David S. Miller [Fri, 10 Nov 2006 03:58:25 +0000 (19:58 -0800)]
[TCP]: Don't set SKB owner in tcp_transmit_skb().
The data itself is already charged to the SKB, doing
the skb_set_owner_w() just generates a lot of noise and
extra atomics we don't really need.
Lmbench improvements on lat_tcp are minimal:
before:
TCP latency using localhost: 23.2701 microseconds
TCP latency using localhost: 23.1994 microseconds
TCP latency using localhost: 23.2257 microseconds
after:
TCP latency using localhost: 22.8380 microseconds
TCP latency using localhost: 22.9465 microseconds
TCP latency using localhost: 22.8462 microseconds
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 10 Nov 2006 00:37:26 +0000 (16:37 -0800)]
[NET] ip-sysctl.txt: Alphabetize.
Rearrange TCP entries in alpha order.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 10 Nov 2006 00:36:36 +0000 (16:36 -0800)]
[TCP]: Allow autoloading of congestion control via setsockopt.
If user has permision to load modules, then autoload then attempt
autoload of TCP congestion module.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 10 Nov 2006 00:35:15 +0000 (16:35 -0800)]
[TCP]: Restrict congestion control choices.
Allow normal users to only choose among a restricted set of congestion
control choices. The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.
For example, to allow any choice:
cp /proc/sys/net/ipv4/tcp_available_congestion_control \
/proc/sys/net/ipv4/tcp_allowed_congestion_control
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 10 Nov 2006 00:32:06 +0000 (16:32 -0800)]
[TCP]: Add tcp_available_congestion_control sysctl.
Create /proc/sys/net/ipv4/tcp_available_congestion_control
that reflects currently available TCP choices.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 10 Nov 2006 00:29:57 +0000 (16:29 -0800)]
[SCTP]: Fix warning
An alternate solution would be to make the digest a pointer, allocate
it in sctp_endpoint_init() and free it in sctp_endpoint_destroy().
I guess I should have originally done it this way...
CC [M] net/sctp/sm_make_chunk.o
net/sctp/sm_make_chunk.c: In function 'sctp_unpack_cookie':
net/sctp/sm_make_chunk.c:1358: warning: initialization discards qualifiers from pointer target type
The reason is that sctp_unpack_cookie() takes a const struct
sctp_endpoint and modifies the digest in it (digest being embedded in
the struct, not a pointer). Make digest a pointer to fix this
warning.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Nov 2006 00:26:09 +0000 (16:26 -0800)]
[IPV6] tcp: Fix typo _read_mostly --> __read_mostly.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Nov 2006 00:23:22 +0000 (16:23 -0800)]
[DCCP]: Fix typo _read_mostly --> __read_mostly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Nov 2006 10:30:37 +0000 (02:30 -0800)]
[NET]: Size listen hash tables using backlog hint
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)
On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.
This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application (2nd parameter of listen())
For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().
We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Kimdon [Fri, 10 Nov 2006 00:16:21 +0000 (16:16 -0800)]
[PKT_SCHED]: Make sch_fifo.o available when CONFIG_NET_SCHED is not set.
Based on patch by Patrick McHardy.
Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
without requiring CONFIG_NET_SCHED.
The d80211 stack needs a generic fifo qdisc for WME. At present it
uses net/d80211/fifo_qdisc.c which is functionally equivalent to
sch_fifo.c. This patch will allow the d80211 stack to remove
net/d80211/fifo_qdisc.c and use sch_fifo.c instead.
Signed-off-by: David Kimdon <david.kimdon@devicescape.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:23:20 +0000 (15:23 -0800)]
[NET] rules: Add support to invert selectors
Introduces a new flag FIB_RULE_INVERT causing rules to apply
if the specified selector doesn't match.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:22:48 +0000 (15:22 -0800)]
[NET] rules: Share common attribute validation policy
Move the attribute policy for the non-specific attributes into
net/fib_rules.h and include it in the respective protocols.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:22:18 +0000 (15:22 -0800)]
[NET] rules: Protocol independant mark selector
Move mark selector currently implemented per protocol into
the protocol independant part.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:21:41 +0000 (15:21 -0800)]
[IPV4] nl_fib_lookup: Rename fl_fwmark to fl_mark
For the sake of consistency.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:20:38 +0000 (15:20 -0800)]
[NET]: Rethink mark field in struct flowi
Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.
The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:19:14 +0000 (15:19 -0800)]
[NET]: Turn nfmark into generic mark
nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Thu, 9 Nov 2006 07:02:19 +0000 (23:02 -0800)]
[DECNET]: Don't clear memory twice.
When dn_neigh.c was converted from kmalloc to kzalloc in commit
0da974f4f303a6842516b764507e3c0a03f41e5a it was missed that
dn_neigh_seq_open was actually clearing the allocation twice was
missed.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Thu, 9 Nov 2006 06:46:26 +0000 (22:46 -0800)]
[XFRM]: uninline xfrm_selector_match()
Six callsites, huge.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Thu, 9 Nov 2006 06:44:35 +0000 (22:44 -0800)]
[BLUETOOTH] lockdep: annotate sk_lock nesting in AF_BLUETOOTH
=============================================
[ INFO: possible recursive locking detected ]
2.6.18-1.2726.fc6 #1
Venkat Yekkirala [Wed, 8 Nov 2006 23:04:26 +0000 (17:04 -0600)]
SELinux: Fix SA selection semantics
Fix the selection of an SA for an outgoing packet to be at the same
context as the originating socket/flow. This eliminates the SELinux
policy's ability to use/sendto SAs with contexts other than the socket's.
With this patch applied, the SELinux policy will require one or more of the
following for a socket to be able to communicate with/without SAs:
1. To enable a socket to communicate without using labeled-IPSec SAs:
allow socket_t unlabeled_t:association { sendto recvfrom }
2. To enable a socket to communicate with labeled-IPSec SAs:
allow socket_t self:association { sendto };
allow socket_t peer_sa_t:association { recvfrom };
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Venkat Yekkirala [Wed, 8 Nov 2006 23:04:09 +0000 (17:04 -0600)]
SELinux: Return correct context for SO_PEERSEC
Fix SO_PEERSEC for tcp sockets to return the security context of
the peer (as represented by the SA from the peer) as opposed to the
SA used by the local/source socket.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Venkat Yekkirala [Wed, 8 Nov 2006 23:03:44 +0000 (17:03 -0600)]
SELinux: Various xfrm labeling fixes
Since the upstreaming of the mlsxfrm modification a few months back,
testing has resulted in the identification of the following issues/bugs that
are resolved in this patch set.
1. Fix the security context used in the IKE negotiation to be the context
of the socket as opposed to the context of the SPD rule.
2. Fix SO_PEERSEC for tcp sockets to return the security context of
the peer as opposed to the source.
3. Fix the selection of an SA for an outgoing packet to be at the same
context as the originating socket/flow.
The following would be the result of applying this patchset:
- SO_PEERSEC will now correctly return the peer's context.
- IKE deamons will receive the context of the source socket/flow
as opposed to the SPD rule's context so that the negotiated SA
will be at the same context as the source socket/flow.
- The SELinux policy will require one or more of the
following for a socket to be able to communicate with/without SAs:
1. To enable a socket to communicate without using labeled-IPSec SAs:
allow socket_t unlabeled_t:association { sendto recvfrom }
2. To enable a socket to communicate with labeled-IPSec SAs:
allow socket_t self:association { sendto };
allow socket_t peer_sa_t:association { recvfrom };
This Patch: Pass correct security context to IKE for use in negotiation
Fix the security context passed to IKE for use in negotiation to be the
context of the socket as opposed to the context of the SPD rule so that
the SA carries the label of the originating socket/flow.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Al Viro [Wed, 8 Nov 2006 08:28:44 +0000 (00:28 -0800)]
[BLUETOOTH] rfcomm endianness bug: param_mask is little-endian on the wire
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:28:19 +0000 (00:28 -0800)]
[BLUETOOTH]: rfcomm endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:27:57 +0000 (00:27 -0800)]
[BLUETOOTH]: bnep endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:27:36 +0000 (00:27 -0800)]
[BLUETOOTH] bnep endianness bug: filtering by packet type
<= and => don't work well on net-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:27:11 +0000 (00:27 -0800)]
[IPV6]: ip6_output annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:26:51 +0000 (00:26 -0800)]
[NETFILTER]: trivial annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:26:29 +0000 (00:26 -0800)]
[AF_PACKET]: annotate
Weirdness: the third argument of socket() is net-endian
here. Oh, well - it's documented in packet(7).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:26:05 +0000 (00:26 -0800)]
[LLC]: anotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:25:41 +0000 (00:25 -0800)]
[IPV6]: annotate inet6_csk_search_req()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:25:17 +0000 (00:25 -0800)]
[IPV6]: flowlabels are net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:24:47 +0000 (00:24 -0800)]
[INET]: annotate inet_ecn.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:24:26 +0000 (00:24 -0800)]
[NET]: annotate dsfield.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:24:06 +0000 (00:24 -0800)]
[XFRM]: annotate ->new_mapping()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:23:42 +0000 (00:23 -0800)]
[AF_KEY]: annotate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:23:14 +0000 (00:23 -0800)]
[IPV4]: encapsulation annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:22:34 +0000 (00:22 -0800)]
[SUNRPC]: annotate hash_ip()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:22:08 +0000 (00:22 -0800)]
[IPV6]: annotate ipv6 mcast
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:21:46 +0000 (00:21 -0800)]
[IPV6]: annotate struct frag_hdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:21:21 +0000 (00:21 -0800)]
[IPV6]: annotate icmpv6 headers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:21:01 +0000 (00:21 -0800)]
[IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:20:21 +0000 (00:20 -0800)]
[XFRM]: misc annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:20:00 +0000 (00:20 -0800)]
[IPV6]: annotate inet6_hashtables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:19:38 +0000 (00:19 -0800)]
[NET]: ipconfig and nfsroot annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 8 Nov 2006 08:19:09 +0000 (00:19 -0800)]
[TIPC]: endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 Dec 2006 05:00:06 +0000 (21:00 -0800)]
[IPV6] NDISC: Calculate packet length correctly for allocation.
MAX_HEADER does not include the ipv6 header length in it,
so we need to add it in explicitly.
With help from YOSHIFUJI Hideaki.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 2 Dec 2006 23:08:32 +0000 (15:08 -0800)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (118 commits)
[netdrvr] skge: build fix
[PATCH] NetXen: driver cleanup, removed unnecessary __iomem type casts
[PATCH] PHY: Add support for configuring the PHY connection interface
[PATCH] chelesio: transmit locking (plus bug fix).
[PATCH] chelsio: statistics improvement
[PATCH] chelsio: add MSI support
[PATCH] chelsio: use standard CRC routines
[PATCH] chelsio: cleanup pm3393 code
[PATCH] chelsio: add 1G swcixw aupport
[PATCH] chelsio: add support for other 10G boards
[PATCH] chelsio: remove unused mutex
[PATCH] chelsio: use kzalloc
[PATCH] chelsio: whitespace fixes
[PATCH] amd8111e use standard CRC lib
[PATCH] sky2: msi enhancements.
[PATCH] sky2: kfree_skb_any needed
[PATCH] sky2: fixes for Yukon EC_U chip revisions
[PATCH] sky2: add Dlink 560SX id
[PATCH] sky2: receive error handling fix
[PATCH] skge: don't clear MC state on link down
...
Linus Torvalds [Sat, 2 Dec 2006 16:29:04 +0000 (08:29 -0800)]
Merge branch 'for-linus' of /linux/kernel/git/drzeus/mmc
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: correct request error handling
mmc: Flush block queue when removing card
mmc: sdhci high speed support
mmc: Support for high speed SD cards
mmc: Fix mmc_delay() function
mmc: Add support for mmc v4 wide-bus modes
[PATCH] mmc: Add support for mmc v4 high speed mode
trivial change for mmc/Kconfig: MMC_PXA does not mean only PXA255
Make general code cleanups
Add MMC_CAP_{MULTIWRITE,BYTEBLOCK} flags
Platform device error handling cleanup
Move register definitions away from the header file
Change OMAP_MMC_{READ,WRITE} macros to use the host pointer
Replace base with virt_base and phys_base
mmc: constify mmc_host_ops vectors
mmc: remove kernel_thread()
Linus Torvalds [Sat, 2 Dec 2006 16:28:28 +0000 (08:28 -0800)]
Merge branch 'release' of /linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
Revert "ACPI: SCI interrupt source override"
Jeff Garzik [Sat, 2 Dec 2006 12:14:39 +0000 (07:14 -0500)]
[netdrvr] skge: build fix
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Len Brown [Sat, 2 Dec 2006 07:27:46 +0000 (02:27 -0500)]
Revert "ACPI: SCI interrupt source override"
This reverts commit
281ea49b0c294649a6de47a6f8fbe5611137726b,
which broke ACPI Interrupt source overrides that move
the SCI from one IRQ in PIC mode to another in IOAPIC mode.
If the SCI shared an interrupt line with another device,
this would result in a "irq 18: nobody cared" type failure.
http://bugzilla.kernel.org/show_bug.cgi?id=7601
Signed-off-by: Len Brown <len.brown@intel.com>
Amit S. Kale [Fri, 1 Dec 2006 13:36:22 +0000 (05:36 -0800)]
[PATCH] NetXen: driver cleanup, removed unnecessary __iomem type casts
Signed-off-by: Amit S. Kale <amitkale@netxen.com>
netxen_nic.h | 38 ++++++++------------------------------
netxen_nic_ethtool.c | 5 ++---
netxen_nic_hw.c | 12 +++++-------
netxen_nic_main.c | 8 +++-----
4 files changed, 18 insertions(+), 45 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Andy Fleming [Fri, 1 Dec 2006 18:01:06 +0000 (12:01 -0600)]
[PATCH] PHY: Add support for configuring the PHY connection interface
Most PHYs connect to an ethernet controller over a GMII or MII
interface. However, a growing number are connected over
different interfaces, such as RGMII or SGMII.
The ethernet driver will tell the PHY what type of connection it
is by setting it manually, or passing it in through phy_connect
(or phy_attach).
Changes include:
* Updates to documentation
* Updates to PHY Lib consumers
* Changes to PHY Lib to add interface support
* Some minor changes to whitespace in phy.h
* gianfar driver now detects interface and passes appropriate
value to PHY Lib
Signed-off-by: Andrew Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:22 +0000 (16:36 -0800)]
[PATCH] chelesio: transmit locking (plus bug fix).
If transmit lock is contended on, then push return code back
and retry at higher level.
Bugfix: If buffer is reallocated because of lack of headroom
and the send is blocked, then drop packet. This is necessary
because caller would end up requeuing a freed skb.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:21 +0000 (16:36 -0800)]
[PATCH] chelsio: statistics improvement
Cleanup statistics management:
* Get rid of duplicate or unused statistics
* Convert high volume stats to per-cpu and 64 bit
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:20 +0000 (16:36 -0800)]
[PATCH] chelsio: add MSI support
Using MSI can avoid sharing IRQ and associated overhead.
Tested on PCI-X.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:19 +0000 (16:36 -0800)]
[PATCH] chelsio: use standard CRC routines
Replace driver crc calculation with existing library.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:18 +0000 (16:36 -0800)]
[PATCH] chelsio: cleanup pm3393 code
Replace macro with function for updating RMON values
Cleanups:
* remove unused enum's
* Fix comment format
Signed-off-by: Stephen HEmminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:17 +0000 (16:36 -0800)]
[PATCH] chelsio: add 1G swcixw aupport
Add support for 1G versions of Chelsio devices.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:16 +0000 (16:36 -0800)]
[PATCH] chelsio: add support for other 10G boards
Add support for other versions of the 10G Chelsio boards.
This is basically a port of the vendor driver with the
TOE features removed.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:15 +0000 (16:36 -0800)]
[PATCH] chelsio: remove unused mutex
This mutex is unused in current (non TOE) code.
Signed-off-by: Stephen Hemminger<shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:14 +0000 (16:36 -0800)]
[PATCH] chelsio: use kzalloc
Use kzalloc in several places.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:36:13 +0000 (16:36 -0800)]
[PATCH] chelsio: whitespace fixes
Fix indentation and blank/tab issues.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Sat, 2 Dec 2006 00:10:50 +0000 (16:10 -0800)]
[PATCH] amd8111e use standard CRC lib
I noticed this driver (and several others) reinvent their own copy of the
existing CRC library. Don't have the hardware, but tested by extracting
code and comparing result.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>