Patrick McHardy [Wed, 5 Dec 2007 07:25:26 +0000 (23:25 -0800)]
[NETFILTER]: x_tables: remove obsolete overflow check
We're not multiplying the size with the number of CPUs anymore, so the
check is obsolete.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 5 Dec 2007 07:24:56 +0000 (23:24 -0800)]
[NETFILTER]: x_tables: struct xt_table_info diet
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids
This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )
In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.
This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Wed, 5 Dec 2007 07:24:03 +0000 (23:24 -0800)]
[NETFILTER]: x_tables: consistent and unique symbol names
Give all Netfilter modules consistent and unique symbol names.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zefan [Wed, 5 Dec 2007 07:22:26 +0000 (23:22 -0800)]
[NETFILTER]: replace list_for_each with list_for_each_entry
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Schnelle [Wed, 5 Dec 2007 07:21:50 +0000 (23:21 -0800)]
[NETFILTER]: x_tables: add TCPOPTSTRIP target
Signed-off-by: Sven Schnelle <svens@bitebene.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Tue, 4 Dec 2007 09:15:45 +0000 (01:15 -0800)]
[NET]: netns compilation speedup
This patch speedups compilation when net_namespace.h is changed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 4 Dec 2007 08:19:38 +0000 (00:19 -0800)]
[NETLINK]: af_netlink.c checkpatch cleanups
Fix large number of checkpatch errors.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 4 Dec 2007 06:54:12 +0000 (22:54 -0800)]
[IPSEC]: Use the correct family for input state lookup
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call. This broke IPv6 obviously. This
patch fixes by getting the input callers to specify the family through
skb->cb.
Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Mon, 3 Dec 2007 11:36:13 +0000 (22:36 +1100)]
[UDP]: Counter increment should be in USER mode for recvmsg
System calls should be USER. So change the BH to USER for
UDP*_INC_STATS_BH().
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Mon, 3 Dec 2007 11:34:16 +0000 (22:34 +1100)]
[UDP]: Clean up for IS_UDPLITE macro
Since we have macro IS_UDPLITE, we can use it.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Chen [Mon, 3 Dec 2007 11:33:28 +0000 (22:33 +1100)]
[UDP]: Defer InDataGrams increment until recvmsg() does checksum
Thanks dave, herbert, gerrit, andi and other people for your
discussion about this problem.
UdpInDatagrams can be confusing because it counts packets that
might be dropped later.
Move UdpInDatagrams into recvmsg() as allowed by the RFC.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:48:06 +0000 (00:48 +0200)]
[TCP]: Abstract tp->highest_sack accessing & point to next skb
Pointing to the next skb is necessary to avoid referencing
already SACKed skbs which will soon be on a separate list.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sun, 30 Dec 2007 12:37:55 +0000 (04:37 -0800)]
[TCP]: Cleanup local variables of clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:48:04 +0000 (00:48 +0200)]
[TCP]: Add unlikely() to urgent handling in clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sun, 30 Dec 2007 12:35:27 +0000 (04:35 -0800)]
[TCP]: Remove duplicated code block from clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:48:02 +0000 (00:48 +0200)]
[TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:48:01 +0000 (00:48 +0200)]
[TCP]: Remove local variable and use packets_in_flight directly
Lines won't be that long and it's compiler's job to optimize
them.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:48:00 +0000 (00:48 +0200)]
[TCP]: MTUprobe: prepare skb fields earlier
They better be valid when call to write_queue functions is made
once things that follow are going in.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:47:59 +0000 (00:47 +0200)]
[TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:47:58 +0000 (00:47 +0200)]
[TCP]: Unite identical code from two seqno split blocks
Bogus seqno compares just mislead, the code is identical for
both sides of the seqno compare (and was even executed just
once because of return in between).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:47:57 +0000 (00:47 +0200)]
[TCP]: Remove superflucious FLAG_DATA_SACKED
To get there, highest_sack must have advanced. When it advances,
a new skb is SACKed, which already sets that FLAG. Besides, the
original purpose of it has puzzled me, never understood why
LOST bit setting of retransmitted skb is marked with
FLAG_DATA_SACKED.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 1 Dec 2007 22:47:56 +0000 (00:47 +0200)]
[TCP]: Move LOSTRETRANS MIB outside !(L|S) check
Usually those skbs will have L set, not counting them as lost
retransmissions is misleading.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:59:38 +0000 (00:59 +1100)]
[IPV6]: Use ctl paths to register addrconf sysctls
This looks very much like the patch for ipv4's devinet.
This is also intended to help us with the net namespaces
and saves the ipv6.ko size by ~320 bytes.
The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:58:37 +0000 (00:58 +1100)]
[IPV6]: Unify and cleanup calls to addrconf_sysctl_register
Currently this call is (ab)used similar to devinet one - it
registers sysctls for devices and for the "default" confs, while
the "all" sysctls are registered separately. But unlike its
devinet brother, the passed inet6_device is needed.
The fix is to make a __addrconf_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)
The original addrconf_sysctl_register() calls the introduced
function, passing the inet6_device, device name and ifindex (to
be used as procname and ctl_name) into it.
Thanks to Herbert again for pointing out, that we can shrink the
argument list to 1 :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:57:08 +0000 (00:57 +1100)]
[IPV4]: Use ctl paths to register devinet sysctls
This looks very much like the patch for neighbors.
The path is also located on the stack and is prepared
inside the function. This time, the call to the registering
function is guarded with the RTNL lock, but I decided
to keep it on the stack not to litter the devinet.c file
with unneeded names and to make it look similar to the
neighbors code.
This is also intended to help us with the net namespaces
and saves the vmlinux size as well - this time by more
than 670 bytes.
The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:55:54 +0000 (00:55 +1100)]
[IPV4]: Unify and cleanup calls to devinet_sysctl_register
Currently this call is used to register sysctls for devices
and for the "default" confs. The "all" sysctls are registered
separately.
Besides, the inet_device is passed to this function, but it is
not needed there at all - just the device name and ifindex are
required.
Thanks to Herbert, who noticed, that this call doesn't even
require the devconf pointer (the last argument) - all we need
we can take from the in_device itself.
The fix is to make a __devinet_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)
The original devinet_sysctl_register() works with struct net_device,
not the inet_device, and calls the introduced function, passing
the device name and ifindex (to be used as procname and ctl_name)
into it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Wed, 21 Nov 2007 20:24:35 +0000 (15:24 -0500)]
softmac: mark as obsolete and schedule for removal
Schedule softmac for for removal in the 2.6.26 development window.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Mon, 28 Jan 2008 06:48:37 +0000 (22:48 -0800)]
bcm43xx: mark as obsolete and schedule for removal
Schedule bcm43xx for for removal in the 2.6.26 development window.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Wed, 21 Nov 2007 16:54:22 +0000 (11:54 -0500)]
mac80211: remove "bcn_int" and "capab" scan results info
These bits were dead code before "mac80211: Remove local->scan_flags"
(commit
6681dd3fd0e4d36a4547415853e83411baa7b705) and probably should
have been removed as part of that commit.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Thu, 29 Nov 2007 08:35:53 +0000 (10:35 +0200)]
mac80211: move A-MSDU identifier to flags
This patch moves u8 amsdu_frame in ieee80211_txrx_data to the flags
section as IEEE80211_TXRXD_RX_AMSDU
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 26 Nov 2007 14:14:34 +0000 (16:14 +0200)]
mac80211: adding 802.11n configuration flows
This patch configures the 802.11n mode of operation
internally in ieee80211_conf structure and in the low-level
driver as well (through op conf_ht).
It does not include AP configuration flows.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 26 Nov 2007 14:14:33 +0000 (16:14 +0200)]
mac80211: adding 802.11n essential A-MSDU Rx capability
This patch adds the ability to receive and handle A-MSDU frames.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 26 Nov 2007 14:14:32 +0000 (16:14 +0200)]
mac80211: adding 802.11n essential A-MPDU addBA capability
This patch adds the capability to identify and answer an add block ACK
request.
As this series of patches only adds HT handling with no aggregations,
(A-MPDU aggregations acceptance is not obligatory according to 802.11n
draft) we are currently sending back a refusal upon this request.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 26 Nov 2007 14:14:31 +0000 (16:14 +0200)]
mac80211: adding 802.11n IEs handling
This patch presents the ability to parse and compose HT IEs, and to put
the IE relevant data inside the mac80211's internal HT structures
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 26 Nov 2007 14:14:30 +0000 (16:14 +0200)]
mac80211: adding 802.11n HT framework definitions
New structures:
- ieee80211_ht_info: describing STA's HT capabilities
- ieee80211_ht_bss_info: describing BSS's HT characteristics
Changed structures:
- ieee80211_hw_mode: now also holds PHY HT capabilities for each HW mode
- ieee80211_conf: ht_conf holds current self HT configuration
ht_bss_conf holds current BSS HT configuration
- flag IEEE80211_CONF_SUPPORT_HT_MODE added to indicate if HT use is
desired
- sta_info: now also holds Peer's HT capabilities
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Mon, 17 Dec 2007 00:09:26 +0000 (16:09 -0800)]
mac80211: adding MAC80211_HT_DEBUG config variable
This patch adds MAC80211_HT_DEBUG config variable
to separate HT debug features
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 28 Nov 2007 10:04:21 +0000 (11:04 +0100)]
mac80211: allow setting drop_unencrypted with wext
This patch allows wpa_supplicant to set the drop_unencrypted setting in
mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 28 Nov 2007 09:55:32 +0000 (10:55 +0100)]
mac80211: make ieee80211_iterate_active_interfaces not need rtnl
Interface iteration in mac80211 can be done without holding any
locks because I converted it to RCU. Initially, I thought this
wouldn't be needed for ieee80211_iterate_active_interfaces but
it's turning out that multi-BSS AP support can be much simpler
in a driver if ieee80211_iterate_active_interfaces can be called
without holding locks. This converts it to use RCU, it adds a
requirement that the callback it invokes cannot sleep.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Rindjunsky [Thu, 22 Nov 2007 17:49:12 +0000 (19:49 +0200)]
mac80211: restructuring data Rx handlers
This patch restructures the Rx handlers chain by incorporating previously
handlers ieee80211_rx_h_802_1x_pae and ieee80211_rx_h_drop_unencrypted
into ieee80211_rx_h_data, already in 802.3 form. this scheme follows more
precisely after the IEEE802.11 data plane archituecture, and will prevent
code duplication to IEEE8021.11n A-MSDU handler.
added function:
- ieee80211_data_to_8023: transfering 802.11 data frames to 802.3 frame
- ieee80211_deliver_skb: delivering the 802.3 frames to upper stack
eliminated handlers:
- ieee80211_rx_h_drop_unencrypted: now function ieee80211_drop_unencrypted
- ieee80211_rx_h_802_1x_pae: now function ieee80211_802_1x_pae
changed handlers:
- ieee80211_rx_h_data: now contains calls to four above function
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yi [Thu, 22 Nov 2007 02:53:21 +0000 (10:53 +0800)]
mac80211: hardware scan rework
The scan code in mac80211 makes the software scan assumption in various
places. For example, we stop the Tx queue during a software scan so that
all the Tx packets will be queued by the stack. We also drop frames not
related to scan in the software scan process. But these are not true for
hardware scan.
Some wireless hardwares (for example iwl3945/4965) has the ability to
perform the whole scan process by hardware and/or firmware. The hardware
scan is relative powerful in that it tries to maintain normal network
traffic while doing a scan in the background. Some drivers (i.e iwlwifi)
do provide a way to tune the hardware scan parameters (for example if the
STA is associated, what's the max time could the STA leave from the
associated channel, how long the scans get suspended after returning to
the service channel, etc). But basically this is transparent to the
stack. mac80211 should not stop Tx queues or drop Rx packets during a
hardware scan.
This patch resolves the above problem by spliting the current scan
indicator local->sta_scanning into local->sta_sw_scanning and
local->sta_hw_scanning. It then changes the scan related code to be aware
of hardware scan or software scan in various places. With this patch,
iwlwifi performs much better in the scan-while-associated condition and
disable_hw_scan=1 should never be required.
Cc: Mohamed Abbas <mohamed.abbas@intel.com>
Cc: Ben Cahill <ben.m.cahill@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:21:52 +0000 (00:21 +1100)]
[IPV6]: Cleanup the addconf_sysctl_register
This only includes fixing the space-indented lines and
removing one unneeded else after the goto.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:17:46 +0000 (00:17 +1100)]
[IPV4]: Cleanup the devinet_sysctl_register
I moved the call to kmalloc() from the *t declaration into
the code (this is confusing when a variable is initialized
with the result of some call) and removed unneeded comment
near the error path. Just like I did with the neigh ctl-s.
Besides, I fixed the goto's and the labels - they were indented
with spaces :(
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:08:16 +0000 (00:08 +1100)]
[NEIGH]: Use the ctl paths to create neighbours sysctls
The appropriate path is prepared right inside this function. It
is prepared similar to how the ctl tables were.
Since the path is modified, it is put on the stack, to avoid
possible races with multiple calls to neigh_sysctl_register() : it
is called by protocols and I didn't find any protection in this
case. Did I overlooked the rtnl lock?.
The stack growth of the neigh_sysctl_register() is 40 bytes. I
believe this is OK, since this is not that much and this function
is not called with the deep stack (device/protocols register).
The device's name is stored on the template to free it later.
This will help with the net namespaces, as each namespace should
have its own set of these ctls.
Besides, this saves ~350 bytes from the neigh template :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 13:06:34 +0000 (00:06 +1100)]
[NEIGH]: Cleanup the neigh_sysctl_register
This mainly removes the err variable, as this call always
return the same error code (-ENOBUFS).
Besides, I moved the call to kmalloc() from the *t declaration
into the code (this is confusing when a variable is initialized
with the result of some call) and removed unneeded comment near
the error path.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 12:51:01 +0000 (23:51 +1100)]
[UNIX]: Make the unix sysctl tables per-namespace
This is the core.
* add the ctl_table_header on the struct net;
* make the unix_sysctl_register and _unregister clone the table;
* moves calls to them into per-net init and exit callbacks;
* move the .data pointer in the proper place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 12:45:41 +0000 (23:45 +1100)]
[UNIX]: Use ctl paths to register unix ctl tables
Unlike previous ones, this patch is useful by its own,
as it decreases the vmlinux size :)
But it will be used later, when the per-namespace sysctl
is added.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 12:44:15 +0000 (23:44 +1100)]
[UNIX]: Move the sysctl_unix_max_dgram_qlen
This will make all the sub-namespaces always use the
default value (10) and leave the tuning via sysctl
to the init namespace only.
Per-namespace tuning is coming.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Sat, 1 Dec 2007 12:40:40 +0000 (23:40 +1100)]
[UNIX]: Extend unix_sysctl_(un)register prototypes
Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.
It is useless right now, but will make the future patches
much simpler.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Sat, 1 Dec 2007 12:31:02 +0000 (23:31 +1100)]
[DECNET]: Remove extra memset from dn_fib_check_nh
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sat, 1 Dec 2007 12:27:18 +0000 (23:27 +1100)]
[IPSEC]: SPD auditing fix to include the netmask/prefix-length
Currently the netmask/prefix-length of an IPsec SPD entry is not included in
any of the SPD related audit messages. This can cause a problem when the
audit log is examined as the netmask/prefix-length is vital in determining
what network traffic is affected by a particular SPD entry. This patch fixes
this problem by adding two additional fields, "src_prefixlen" and
"dst_prefixlen", to the SPD audit messages to indicate the source and
destination netmasks. These new fields are only included in the audit message
when the netmask/prefix-length is less than the address length, i.e. the SPD
entry applies to a network address and not a host address.
Example audit message:
type=UNKNOWN[1415] msg=audit(
1196105849.752:25): auid=0 \
subj=root:system_r:unconfined_t:s0-s0:c0.c1023 op=SPD-add res=1 \
src=192.168.0.0 src_prefixlen=24 dst=192.168.1.0 dst_prefixlen=24
In addition, this patch also fixes a few other things in the
xfrm_audit_common_policyinfo() function. The IPv4 string formatting was
converted to use the standard NIPQUAD_FMT constant, the memcpy() was removed
from the IPv6 code path and replaced with a typecast (the memcpy() was acting
as a slow, implicit typecast anyway), and two local variables were created to
make referencing the XFRM security context and selector information cleaner.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Fri, 30 Nov 2007 00:47:15 +0000 (22:47 -0200)]
[TFRC]: Hide tx history details from the CCIDs
Based on a previous patch by Gerrit Renker.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 30 Nov 2007 12:55:42 +0000 (23:55 +1100)]
[NET]: Implement the per network namespace sysctl infrastructure
The user interface is: register_net_sysctl_table and
unregister_net_sysctl_table. Very much like the current
interface except there is a network namespace parameter.
With this any sysctl registered with register_net_sysctl_table
will only show up to tasks in the same network namespace.
All other sysctls continue to be globally visible.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 30 Nov 2007 12:54:00 +0000 (23:54 +1100)]
sysctl: Infrastructure for per namespace sysctls
This patch implements the basic infrastructure for per namespace sysctls.
A list of lists of sysctl headers is added, allowing each namespace to have
it's own list of sysctl headers.
Each list of sysctl headers has a lookup function to find the first
sysctl header in the list, allowing the lists to have a per namespace
instance.
register_sysct_root is added to tell sysctl.c about additional
lists of sysctl_headers. As all of the users are expected to be in
kernel no unregister function is provided.
sysctl_head_next is updated to walk through the list of lists.
__register_sysctl_paths is added to add a new sysctl table on
a non-default sysctl list.
The only intrusive part of this patch is propagating the information
to decided which list of sysctls to use for sysctl_check_table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 30 Nov 2007 12:52:10 +0000 (23:52 +1100)]
sysctl: Remember the ctl_table we passed to register_sysctl_paths
By doing this we allow users of register_sysctl_paths that build
and dynamically allocate their ctl_table to be simpler. This allows
them to just remember the ctl_table_header returned from
register_sysctl_paths from which they can now find the
ctl_table array they need to free.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 30 Nov 2007 12:50:18 +0000 (23:50 +1100)]
sysctl: Add register_sysctl_paths function
There are a number of modules that register a sysctl table
somewhere deeply nested in the sysctl hierarchy, such as
fs/nfs, fs/xfs, dev/cdrom, etc.
They all specify several dummy ctl_tables for the path name.
This patch implements register_sysctl_path that takes
an additional path name, and makes up dummy sysctl nodes
for each component.
This patch was originally written by Olaf Kirch and
brought to my attention and reworked some by Olaf Hering.
I have changed a few additional things so the bugs are mine.
After converting all of the easy callers Olaf Hering observed
allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.
.text +897
.data -7008
text data bss dec hex filename
26959310 4045899 4718592 35723801 2211a19 ../vmlinux-vanilla
26960207 4038891 4718592 35717690 221023a ../O-allyesconfig/vmlinux
So this change is both a space savings and a code simplification.
CC: Olaf Kirch <okir@suse.de>
CC: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 29 Nov 2007 14:17:11 +0000 (01:17 +1100)]
[NETFILTER]: Convert old checksum helper names
Kill the defines again, convert to the new checksum helper names and
remove the dependency of NET_ACT_NAT on NETFILTER.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 29 Nov 2007 14:14:30 +0000 (01:14 +1100)]
[NET]: Move netfilter checksum helpers to net/core/utils.c
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 28 Nov 2007 14:06:04 +0000 (12:06 -0200)]
[DCCP]: Remove duplicate test for CloseReq
This removes a redundant test for unexpected packet types. In dccp_rcv_state_process
it is tested twice whether a DCCP-server has received a CloseReq (Step 7):
* first in the combined if-statement,
* then in the call to dccp_rcv_closereq().
The latter is necesssary since dccp_rcv_closereq() is also called from
__dccp_rcv_established().
This patch removes the duplicate test.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 28 Nov 2007 13:59:48 +0000 (11:59 -0200)]
[DCCP]: Integrate state transitions for passive-close
This adds the necessary state transitions for the two forms of passive-close
* PASSIVE_CLOSE - which is entered when a host receives a Close;
* PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.
Here is a detailed account of what the patch does in each state.
1) Receiving CloseReq
The pseudo-code in 8.5 says:
Step 13: Process CloseReq
If P.type == CloseReq and S.state < CLOSEREQ,
Generate Close
S.state := CLOSING
Set CLOSING timer.
This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.
* CLOSED: silently ignore - it may be a late or duplicate CloseReq;
* LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
* REQUEST: perform Step 13 directly (no need to enqueue packet);
* OPEN/PARTOPEN: enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.
When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.
2) Receiving Close
The code below from 8.5 is unconditional.
Step 14: Process Close
If P.type == Close,
Generate Reset(Closed)
Tear down connection
Drop packet and return
Thus we need to consider all states:
* CLOSED: silently ignore, since this can happen when a retransmitted or late Close arrives;
* LISTEN: dccp_rcv_state_process() will generate a Reset ("No Connection");
* REQUEST: perform Step 14 directly (no need to enqueue packet);
* RESPOND: dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
* OPEN/PARTOPEN: enter PASSIVE_CLOSE so that application has a chance to process unread data;
* CLOSEREQ: server performed active-close -- perform Step 14;
* CLOSING: simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
* PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
* TIMEWAIT: packet is ignored.
Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
is no socket for such a connection": the socket still exists, but its state indicates it is unusable.
Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 28 Nov 2007 13:34:53 +0000 (11:34 -0200)]
[DCCP]: Dedicated auxiliary states to support passive-close
This adds two auxiliary states to deal with passive closes:
* PASSIVE_CLOSE (reached from OPEN via reception of Close) and
* PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.
These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).
Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.
PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).
The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 28 Nov 2007 08:35:08 +0000 (08:35 +0000)]
[DCCP]: Use AF-independent rebuild_header routine
This fixes a nasty bug: dccp_send_reset() is called by both DCCPv4 and DCCPv6, but uses
inet_sk_rebuild_header() in each case. This leads to unpredictable and weird behaviour:
under some conditions, DCCPv6 Resets were sent, in other not.
The fix is to use the AF-independent rebuild_header routine.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 28 Nov 2007 13:15:40 +0000 (11:15 -0200)]
[TFRC]: Migrate TX history to singly-linked lis
This patch was based on another made by Gerrit Renker, his changelog was:
------------------------------------------------------
The patch set migrates TFRC TX history to a singly-linked list.
The details are:
* use of a consistent naming scheme (all TFRC functions now begin with `tfrc_');
* allocation and cleanup are taken care of internally;
* provision of a lookup function, which is used by the CCID TX infrastructure
to determine the time a packet was sent (in turn used for RTT sampling);
* integration of the new interface with the present use in CCID3.
------------------------------------------------------
Simplifications I did:
. removing the tfrc_tx_hist_head that had a pointer to the list head and
another for the slabcache.
. No need for creating a slabcache for each CCID that wants to use the TFRC
tx history routines, create a single slabcache when the dccp_tfrc_lib module
init routine is called.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 29 Nov 2007 13:59:07 +0000 (00:59 +1100)]
[TCP]: Two fixes to new sacktag code
1) Skip condition used to be wrong way around which made SACK
processing very broken, missed many blocks because of that.
2) Use highest_sack advancement only if some skbs are already
sacked because otherwise tcp_write_queue_next may move things
too far (occurs mainly with GSO). The other similar advancement
is not problem because highest_sack was previosly put to point
a sacked skb.
These problems were located because of problem report from Matt
Mathis <mathis@psc.edu>.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 29 Nov 2007 13:42:42 +0000 (00:42 +1100)]
[NET]: Nicer WARN_ON in netstat_show
The
if (statement)
WARN_ON(1);
looks much better as
WARN_ON(statement);
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fred L. Templin [Thu, 29 Nov 2007 11:11:40 +0000 (22:11 +1100)]
[IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.
This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 29 Nov 2007 10:22:33 +0000 (21:22 +1100)]
[NET]: Eliminate unused argument from sk_stream_alloc_pskb
The 3rd argument is always zero (according to grep :) Eliminate
it and merge the function with sk_stream_alloc_skb.
This saves 44 more bytes, and together with the previous patch
we have:
add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568)
function old new delta
sk_stream_alloc_skb - 183 +183
ip_rt_init 529 525 -4
arp_ignore 112 107 -5
__inet_lookup_listener 284 274 -10
tcp_sendmsg 2583 2481 -102
tcp_sendpage 1449 1300 -149
tso_fragment 417 258 -159
tcp_fragment 1149 988 -161
__tcp_push_pending_frames 1998 1837 -161
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 29 Nov 2007 09:28:50 +0000 (20:28 +1100)]
[NET]: Uninline the sk_stream_alloc_pskb
This function seems too big for inlining. Indeed, it saves
half-a-kilo when uninlined:
add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524)
function old new delta
sk_stream_alloc_pskb - 195 +195
ip_rt_init 529 525 -4
__inet_lookup_listener 284 274 -10
tcp_sendmsg 2583 2486 -97
tcp_sendpage 1449 1305 -144
tso_fragment 417 267 -150
tcp_fragment 1149 992 -157
__tcp_push_pending_frames 1998 1841 -157
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joonwoo Park [Mon, 26 Nov 2007 15:31:24 +0000 (23:31 +0800)]
[IPV4] fib_hash: kmalloc + memset conversion to kzalloc
fib_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joonwoo Park [Mon, 26 Nov 2007 15:29:32 +0000 (23:29 +0800)]
[IPV4] fib_semantics: kmalloc + memset conversion to kzalloc
fib_semantics: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joonwoo Park [Mon, 26 Nov 2007 15:23:21 +0000 (23:23 +0800)]
[IPSEC]: kmalloc + memset conversion to kzalloc
2007/11/26, Patrick McHardy <kaber@trash.net>:
> How about also switching vmalloc/get_free_pages to GFP_ZERO
> and getting rid of the memset entirely while you're at it?
>
xfrm_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Mon, 26 Nov 2007 12:17:38 +0000 (20:17 +0800)]
[TCP]: Move FRTO checks out from write queue abstraction funcs
Better place exists in update_send_head (other non-queue related
adjustments are done there as well) which is the only caller of
tcp_advance_send_head (now that the bogus call from mtu_probe is
gone).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 26 Nov 2007 12:12:58 +0000 (20:12 +0800)]
[NET]: Make macro to specify the ptype_base size
Currently this size is 16, but as the comment says this
is so only because all the chains (except one) has the
length 1. I think, that some day this may change, so
growing this hash will be much easier.
Besides, symbolic names are read better than magic constants.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 26 Nov 2007 12:10:50 +0000 (20:10 +0800)]
[NET]: Name magic constants in sock_wake_async()
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.
I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.
I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:14:15 +0000 (22:14 -0200)]
[DCCP]: Add support for abortive release
This continues from the previous patch and adds support for actively aborting
a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
abortive release.
I have tried this in various client/server settings and it works as expected.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 17 Dec 2007 00:06:03 +0000 (16:06 -0800)]
[DCCP]: Check for unread data on close
This removes one FIXME with regard to close when there is still unread data.
The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
a Reset with Code 2, "Aborted" is sent to the peer.
This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:12:06 +0000 (22:12 -0200)]
[CCID2]: Remove misleading comment
This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none.
The comment assumes it is possible that a packet is sent between the calls to
ccid_hc_tx_send_packet(),
dccp_transmit_skb(),
ccid_hc_tx_packet_sent()
(in the above order) in dccp_write_xmit().
I think that this is impossible, since dccp_write_xmit() is always called under lock:
* when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked
(see code comment above dccp_send_close());
* when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called;
* when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called
and the if/else statement has made sure that sk_lock.owner is not set;
* there are no other places where dccp_write_xmit() is called.
Furthermore, the debug statement for printing the sequence number of the packet just sent has been
removed, since the entire list is being printed anyway and so the entry of that number appears last.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:10:29 +0000 (22:10 -0200)]
[CCID2]: Remove redundant ack-counting variable
The code used two different variables to count Acks, one of them redundant.
This patch reduces the number of Ack counters to one.
The type of the Ack counter has also been changed to u32 (twice the range of int);
and the variable has been renamed into `packets_acked' - for consistency with
RFC 3465 (and similarly named variables are used by TCP and SCTP).
Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios,
maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:09:35 +0000 (22:09 -0200)]
[CCID2]: Remove redundant synchronisation variable
This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1
when the CCID2 sender may send a new packet, and which is set to 0 otherwise
The variable is redundant, since it is only used in combination with the hc_tx_send_packet/
hc_tx_packet_sent function pair. Both functions are called under socket lock, so the
following happens when the CCID2 may send a new packet:
* it sets sendwait = 1 in tx_send_packet and returns 0;
* the subsequent call to tx_packet_sent clears the sendwait flag;
* since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition
in tx_packet_sent is never satisfied, since that function is never called when
tx_send_packet returns a value different from 0 (cf. dccp_write_xmit);
* the call to tx_packet_sent clears the flag so that the condition "!sendwait" is
true the next time tx_packet_sent is called.
In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet
and tx_packet_sent -- which is what the patch does.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:08:27 +0000 (22:08 -0200)]
[CCID2]: Redundant debugging output
This reduces the amount of redundant debugging messages:
* pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent().
Both functions are called immediately after one another, so one occurrence is sufficient.
* Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant.
* In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:06:52 +0000 (22:06 -0200)]
[CCID2]: Replace pipe assignment-function with assignment
The function ccid2_change_pipe only does an assignment. This patch simplifies the code by
replacing the function with the assignment it performs.
Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range).
As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed,
but replaced with a less annoying `DCCP_BUG').
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:05:51 +0000 (22:05 -0200)]
[CCID2]: Replace cwnd assignment-function with assignment
The current function ccid2_change_cwnd in effect makes only an assignment, as
the test whether cwnd has reached 0 is only required when cwnd is halved.
This patch simplifies the code by replacing the function with the assignment
it performs.
Furthermore, since ssthresh derives from cwnd and appears in many assignments and
comparisons, the type of ssthresh has also been changed to match that of cwnd.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:04:35 +0000 (22:04 -0200)]
[CCID2]: Replace read-only variable with constant
This replaces the field member `numdupack', which was used as a read-only
constant in the code, with a #define.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 25 Nov 2007 00:01:56 +0000 (22:01 -0200)]
[CCID2]: Remove unused variable
This removes a variable `ccid2hctx_sent' which is incremented but
never referenced/read (i.e., dead code).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 23:58:33 +0000 (21:58 -0200)]
[CCID2]: Disable broken Ack Ratio adaptation algorithm
This comments out a problematic section comprising a half-finished algorithm:
- The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and
hence in fact is a read-only constant.
- The `arsent' variable counts packets other than Acks (it is incremented for every packet),
and there is no test for Ack Loss.
- The concept of counting Acks as such leads to a complex calculation, and the calculation
at the moment is inconsistent with this concept.
The problem is that the number of Acks - rather than the number of windows - is counted,
which leads to a complex (cubic/quadratic) expression - this is not even implemented.
In its current state, the commented-out algorithm interfers with normal processing by
changing Ack Ratio incorrectly, and at the wrong times.
A new algorithm is necessary, which will not necessarily use the same variables as used by
the unfinished one; hence the old variables have been removed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 23:44:30 +0000 (21:44 -0200)]
[CCID2]: Larger initial windows also for CCID2
RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most
four packets for new connections, following the rules from [RFC3390]", which
is implemented by this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Sat, 24 Nov 2007 23:42:53 +0000 (21:42 -0200)]
[DCCP]: Initialize dccp_sock before calling the ccid constructors
This is because in the next patch CCID2 will assume that dccps_mss_cache is
non-zero.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 23:40:24 +0000 (21:40 -0200)]
[CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd
This patch removes a bug in the current code. I agree with Andrea's comment
that there is a problem here but the way it is treated does not fix it.
The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs:
* the receiver will not send an Ack until (Ack Ratio - cwnd) data packets
have arrived;
* the sender will not send any data packet before the receipt of an Ack
advances the send window.
The only way that the connection then progresses was via RTO timeout. In one
extreme case (bulk transfer), it was observed that this happened for every single
packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart:
a transfer which normally would take a fraction of a second thus grew to
several minutes.
The solution taken by this approach is to observe the relation
"Ack Ratio <= cwnd"
by using the constraint (1) from RFC 4341, 6.1.2; i.e. set
Ack Ratio = ceil(cwnd / 2)
and update it whenever either Ack Ratio or cwnd change. This ensures that
the deadlock problem can not arise.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 23:32:53 +0000 (21:32 -0200)]
[CCID2]: Don't assign negative values to Ack Ratio
Since it makes not sense to assign negative values to Ack Ratio, this
patch disallows this possibility.
As a consequence, a Bug test for negative Ack Ratio values becomes obsolete.
Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes,
due to RFC 4340, 11.3) has been added.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 22:43:59 +0000 (20:43 -0200)]
[CCID2]: Fix sequence number arithmetic/comparisons
This replaces use of normal subtraction with modulo-48 subtraction.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 24 Nov 2007 22:37:48 +0000 (20:37 -0200)]
[CCID2]: Bug in reading Ack Vectors
In CCID2 the receiver-history is sorted in ascending order of sequence number,
but the processing of received Ack Vectors requires the list traversal in the
opposite direction.
The current code has a bug in this regard: the list traversal is upwards. As a
consequence, only Ack Vectors with a run length of 1 will pass, in all other
Ack Vectors the remaining (acked) sequence numbers are missed, and may later
falsely be identified as lost.
Note: This bug is only visible when Ack Ratio > 1, since otherwise the run
lengths of Ack Vectors are 0.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sun, 30 Dec 2007 12:19:31 +0000 (04:19 -0800)]
[ACKVEC]: Reduce length of identifiers
This is reduces the length of the struct ackvec/ackvec_record fields. It is
a purely text-based replacement:
s#dccpavr_#avr_#g;
s#dccpav_#av_#g;
and increases readability somewhat.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Mon, 26 Nov 2007 15:34:54 +0000 (23:34 +0800)]
[PCOUNTER] Fix build error without CONFIG_SMP
I keep getting this build error and couldn't find anyone fixing
it in archives. ...Maybe all net developers except me build
just SMP kernels :-).
In file included from include/net/sock.h:50,
from ipc/mqueue.c:35:
include/linux/pcounter.h: In function 'pcounter_add':
include/linux/pcounter.h:87: error: 'struct pcounter' has no
member named 'value'
make[1]: *** [ipc/mqueue.o] Error 1
make: *** [ipc] Error 2
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Fri, 23 Nov 2007 13:28:44 +0000 (21:28 +0800)]
[IPV6]: Correct the comment concerning inetsw6 table
It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c,
but one comment was not fixed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Fri, 23 Nov 2007 12:30:01 +0000 (20:30 +0800)]
[UNIX] Move the unix sock iterators in to proper place
The first_unix_socket() and next_unix_sockets() are now used
in proc file and in forall_unix_socets macro only.
The forall_unix_sockets is not used in this file at all so
remove it. After this move the helpers to where they really
belong, i.e. closer to proc code under the #ifdef CONFIG_PROC_FS
option.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 12:14:31 +0000 (10:14 -0200)]
[DCCP]: Update documentation on ioctls
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 12:13:53 +0000 (10:13 -0200)]
[DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also
Small update with regard to RFC 4340 (references added as documentation):
on Requests, Ack Vectors / Elapsed Time should be ignored.
Length handling of Elapsed Time also simplified.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 12:11:52 +0000 (10:11 -0200)]
[DCCP]: Remove redundant dependency on IP_DCCP
This cleans up the consequences of an earlier patch which
introduced the `if IP_DCCP' clause into net/dccp/Kconfig.
The CCID Kconfig menu is sourced within this clause; as a
consequence, all tests of type `depends on IP_DCCP' are now
redundant.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 12:09:56 +0000 (10:09 -0200)]
[DCCP]: Promote CCID2 as default CCID
This patch addresses the following problems:
1. DCCP relies for its proper functioning on having at least one CCID module
enabled (as in TCP plugable congestion control). Currently it is possible to
disable both CCIDs and thus leave the DCCP module in a compiled, but entirely
non-functional state: no sockets can be created when no CCID is available.
Furthermore, the protocol is (again like TCP) not intended to be used without
CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation.
2. Internally the default CCID that is advertised by the Linux host is set to CCID2
(DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig
menu without changing the defaults leads to a failure `module not found' when
trying to load the dccp module (which internally tries to load the default CCID).
3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a
`minimum common denominator'; the specification says that:
* "New connections start with CCID 2 for both endpoints"
* "A DCCP implementation intended for general use, such as an implementation in a
general-purpose operating system kernel, SHOULD implement at least CCID 2.
The intent is to make CCID 2 broadly available for interoperability [...]"
Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable.
Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added.
Discussions with Ian McDonald on this subject are gratefully acknowledged.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 12:00:17 +0000 (10:00 -0200)]
[DCCP]: Update documentation
This updates the DCCP documentation, following input from Ian McDonald,
clarifiying the status of DCCP, and adding a note about the test tree.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Wed, 21 Nov 2007 11:56:48 +0000 (09:56 -0200)]
[DCCP]: Honour and make use of shutdown option set by user
This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.
This patch exploits the information provided by the user via the socket API to reduce
processing costs:
* if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
* if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
as long as the TX queue has not been drained, we need to receive feedback to keep
congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
packet (under congestion control) has been sent;
* although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).
Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
to this by the present patch.
There will also no longer be any delivery when the socket is in the final stages, i.e. when
one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
that stage the connection is its final stages.
Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown
A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).
Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
sure the socket is a TCP socket". This should probably be extended to mean
`TCP or DCCP socket' (the code is also used by UDP and raw sockets).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>