GitHub/exynos8895/android_kernel_samsung_universal8895.git
14 years agoipv6: mcast: RCU conversions
Eric Dumazet [Mon, 7 Jun 2010 21:05:02 +0000 (21:05 +0000)]
ipv6: mcast: RCU conversions

- ipv6_sock_mc_join() : doesnt touch dev refcount

- ipv6_sock_mc_drop() : doesnt touch dev/idev refcounts

- ip6_mc_find_dev() becomes ip6_mc_find_dev_rcu() (called from rcu),
                    and doesnt touch dev/idev refcounts

- ipv6_sock_mc_close() : doesnt touch dev/idev refcounts

- ip6_mc_source() uses ip6_mc_find_dev_rcu()

- ip6_mc_msfilter() uses ip6_mc_find_dev_rcu()

- ip6_mc_msfget() uses ip6_mc_find_dev_rcu()

- ipv6_dev_mc_dec(), ipv6_chk_mcast_addr(),
  igmp6_event_query(), igmp6_event_report(),
  mld_sendpack(), igmp6_send() dont touch idev refcount

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocleanup: remove pppoe_xmit() declaration.
Rami Rosen [Tue, 8 Jun 2010 19:07:56 +0000 (19:07 +0000)]
cleanup: remove pppoe_xmit() declaration.

There is no need for pppoe_xmit() forward declaration in
drivers/net/pppoe.c. This patch removes this  pppoe_xmit() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoicmp: RCU conversion in icmp_address_reply()
Eric Dumazet [Mon, 7 Jun 2010 22:34:35 +0000 (22:34 +0000)]
icmp: RCU conversion in icmp_address_reply()

- rcu_read_lock() already held by caller
- use __in_dev_get_rcu() instead of in_dev_get() / in_dev_put()
- remove goto out;

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agophonet: use call_rcu for phonet device free
Jiri Pirko [Mon, 7 Jun 2010 03:27:39 +0000 (03:27 +0000)]
phonet: use call_rcu for phonet device free

Use call_rcu rather than synchronize_rcu.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoanycast: Some RCU conversions
Eric Dumazet [Mon, 7 Jun 2010 11:42:13 +0000 (11:42 +0000)]
anycast: Some RCU conversions

- dev_get_by_flags() changed to dev_get_by_flags_rcu()

- ipv6_sock_ac_join() dont touch dev & idev refcounts
- ipv6_sock_ac_drop() dont touch dev & idev refcounts
- ipv6_sock_ac_close() dont touch dev & idev refcounts
- ipv6_dev_ac_dec() dount touch idev refcount
- ipv6_chk_acast_addr() dont touch idev refcount

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: avoid two atomic ops in ip_rcv_options()
Eric Dumazet [Mon, 7 Jun 2010 03:54:46 +0000 (03:54 +0000)]
net: avoid two atomic ops in ip_rcv_options()

in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: avoid two atomic ops in ip_rt_redirect()
Eric Dumazet [Tue, 8 Jun 2010 04:49:44 +0000 (21:49 -0700)]
ipv4: avoid two atomic ops in ip_rt_redirect()

in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.

[ Fix build with CONFIG_IP_ROUTE_VERBOSE disabled. -DaveM ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigmp: avoid two atomic ops in igmp_rcv()
Eric Dumazet [Mon, 7 Jun 2010 03:17:10 +0000 (03:17 +0000)]
igmp: avoid two atomic ops in igmp_rcv()

in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip: Router Alert RCU conversion
Eric Dumazet [Mon, 7 Jun 2010 03:12:08 +0000 (03:12 +0000)]
ip: Router Alert RCU conversion

Straightforward conversion to RCU.

One rwlock becomes a spinlock, and is static.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomacvlan: use call_rcu for port free
Jiri Pirko [Mon, 7 Jun 2010 01:36:29 +0000 (01:36 +0000)]
macvlan: use call_rcu for port free

Use call_rcu rather than synchronize_rcu.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Remove unnecessary net action assertion
jamal [Fri, 4 Jun 2010 02:06:22 +0000 (02:06 +0000)]
net: Remove unnecessary net action assertion

The extra assertion to allow packet munging only when there are
no other ptypes listening which may have worked around an old bug
is unnecessary. It is sufficient to check if the skb is cloned before
trampling on it. Thanks to Herbert Xu for being persistent and patient
in getting this across.
[Note that cloning checks and assertions are the general rule used
by tc actions (documentation/networking/tc-actions-env-rules.txt)].

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet sched: make pedit check for clones instead
jamal [Fri, 4 Jun 2010 02:43:06 +0000 (02:43 +0000)]
net sched: make pedit check for clones instead

Now that the core path doesnt set OK to munge we detect
writable skbs by looking to see if they are cloned.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agohtb: remove two unnecessary assignments
Changli Gao [Fri, 4 Jun 2010 01:33:33 +0000 (01:33 +0000)]
htb: remove two unnecessary assignments

remove two unnecessary assignments

we don't need to assign NULL when initialize structure objects.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/sch_htb.c |    2 --
 1 file changed, 2 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agogreth: Remove unnecessary memset of napi member in netdev private data
Tobias Klauser [Thu, 3 Jun 2010 20:24:21 +0000 (20:24 +0000)]
greth: Remove unnecessary memset of napi member in netdev private data

The memory for the private data is allocated using kzalloc in
alloc_etherdev (or alloc_netdev_mq respectively) so there is no need to
set the napi member to 0 explicitely.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoraw: avoid two atomics in xmit
Eric Dumazet [Thu, 3 Jun 2010 22:23:57 +0000 (22:23 +0000)]
raw: avoid two atomics in xmit

Avoid two atomic ops per raw_send_hdrinc() call

Avoid two atomic ops per raw6_send_hdrinc() call

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet-caif: Added missing lock validator constants
Alex Lorca [Mon, 7 Jun 2010 08:01:22 +0000 (01:01 -0700)]
net-caif: Added missing lock validator constants

CAIF is using "xxx-AF_MAX" strings for the lock validator. It should use
its own strings.

Signed-off-by: Alex Lorca <alex.lorca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: Fix slowness in read /proc/net/tcp
Tom Herbert [Mon, 7 Jun 2010 07:43:42 +0000 (00:43 -0700)]
tcp: Fix slowness in read /proc/net/tcp

This patch address a serious performance issue in reading the
TCP sockets table (/proc/net/tcp).

Reading the full table is done by a number of sequential read
operations.  At each read operation, a seek is done to find the
last socket that was previously read.  This seek operation requires
that the sockets in the table need to be counted up to the current
file position, and to count each of these requires taking a lock for
each non-empty bucket.  The whole algorithm is O(n^2).

The fix is to cache the last bucket value, offset within the bucket,
and the file position returned by the last read operation.   On the
next sequential read, the bucket and offset are used to find the
last read socket immediately without needing ot scan the previous
buckets  the table.  This algorithm t read the whole table is O(n).

The improvement offered by this patch is easily show by performing
cat'ing /proc/net/tcp on a machine with a lot of connections.  With
about 182K connections in the table, I see the following:

- Without patch
time cat /proc/net/tcp > /dev/null

real 1m56.729s
user 0m0.214s
sys 1m56.344s

- With patch
time cat /proc/net/tcp > /dev/null

real 0m0.894s
user 0m0.290s
sys 0m0.594s

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Update version to 3.111
Matt Carlson [Sat, 5 Jun 2010 17:24:39 +0000 (17:24 +0000)]
tg3: Update version to 3.111

This patch updates the tg3 version to 3.111.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Add 5719 PCI device and phy IDs
Matt Carlson [Sat, 5 Jun 2010 17:24:38 +0000 (17:24 +0000)]
tg3: Add 5719 PCI device and phy IDs

This patch adds the 5719 PCI device and phy IDs.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Add 5719 ASIC rev
Matt Carlson [Sat, 5 Jun 2010 17:24:37 +0000 (17:24 +0000)]
tg3: Add 5719 ASIC rev

This patch adds the 5719 ASIC revision.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Use devfn to determine function number
Matt Carlson [Sat, 5 Jun 2010 17:24:36 +0000 (17:24 +0000)]
tg3: Use devfn to determine function number

The driver sometimes needs to know which function number the current
device is.  This patch changes the code to use devfn over internal
register values.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: 5717: Allow serdes link via parallel detect
Matt Carlson [Sat, 5 Jun 2010 17:24:35 +0000 (17:24 +0000)]
tg3: 5717: Allow serdes link via parallel detect

The 5717 serdes phy brings link up via parallel detection without any
additional help from the driver.  This patch changes the
tg3_setup_fiber_mii_phy() function to detect and allow the use of this
feature.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Allow single MSI-X vector allocations
Matt Carlson [Sat, 5 Jun 2010 17:24:34 +0000 (17:24 +0000)]
tg3: Allow single MSI-X vector allocations

This patch changes the code to make it legal to allocate only one MSI-X
vector.  It also fixes a bug where the driver was not checking for error
return codes from pci_enable_msix().

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Off-by-one error in RSS setup
Matt Carlson [Sat, 5 Jun 2010 17:24:33 +0000 (17:24 +0000)]
tg3: Off-by-one error in RSS setup

The driver was incorrectly programming the indirection table such that
rx traffic intended for the second ring went to the first ring, rx
traffic intended for the third ring went to the second ring, etc.  This
patch changes the code so that rx traffic is diverted to the proper
ring.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Fix a memory leak on 5717+ devices
Matt Carlson [Sat, 5 Jun 2010 17:24:32 +0000 (17:24 +0000)]
tg3: Fix a memory leak on 5717+ devices

The rx resources for MSI-X interrupt vector 0 were not being freed
correctly.  This happens because the teardown loop continue's to the
next loop iteration if it detects the tx ring for that vector is not
setup, thus bypassing the rx teardown code.  This patch moves the
call to tg3_rx_prodring_free() earlier in the loop.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Avoid tx lockups on 5755+ devices
Matt Carlson [Sat, 5 Jun 2010 17:24:31 +0000 (17:24 +0000)]
tg3: Avoid tx lockups on 5755+ devices

In certain edge conditions, internal tx resources can get corrupted.
This patch enables a bit that will fix the problem.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Relocate APE mutex regs for 5717+
Matt Carlson [Sat, 5 Jun 2010 17:24:30 +0000 (17:24 +0000)]
tg3: Relocate APE mutex regs for 5717+

The 5717 and later devices relocate the APE mutex registers.  This patch
organizes the code so that the driver can use the mutex registers in the
old and new locations.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 7 Jun 2010 00:42:02 +0000 (17:42 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/sfc/net_driver.h
drivers/net/sfc/siena.c

14 years agor8169: fix random mdio_write failures
Timo Teräs [Sun, 6 Jun 2010 22:38:47 +0000 (15:38 -0700)]
r8169: fix random mdio_write failures

Some configurations need delay between the "write completed" indication
and new write to work reliably.

Realtek driver seems to use longer delay when polling the "write complete"
bit, so it waits long enough between writes with high probability (but
could probably break too). This patch adds a new udelay to make sure we
wait unconditionally some time after the write complete indication.

This caused a regression with XID 18000000 boards when the board specific
phy configuration writing many mdio registers was added in commit
2e955856ff (r8169: phy init for the 8169scd). Some of the configration
mdio writes would almost always fail, and depending on failure might leave
the PHY in non-working state.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip6mr: fix a typo in ip6mr_for_each_table()
Eric Dumazet [Sun, 6 Jun 2010 22:34:40 +0000 (15:34 -0700)]
ip6mr: fix a typo in ip6mr_for_each_table()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbevf: Enable GRO by default
Shirley Ma [Sat, 5 Jun 2010 10:04:50 +0000 (03:04 -0700)]
ixgbevf: Enable GRO by default

Enable GRO by default for performance.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: avoid high order allocations
Eric Dumazet [Sat, 5 Jun 2010 10:03:30 +0000 (03:03 -0700)]
ipv6: avoid high order allocations

With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
are very unreliable, on machines where PAGE_SIZE=4K

Limit allocated skbs to be at most one page. (order-0 allocations)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofec: convert legacy PM hooks to dem_pm_ops
Denis Kirjanov [Wed, 2 Jun 2010 09:27:04 +0000 (09:27 +0000)]
fec: convert legacy PM hooks to dem_pm_ops

This patch compile tested only.

Convert legacy PM hooks to dev_pm_ops
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: allow user-controlled output slave selection
Andy Gospodarek [Wed, 2 Jun 2010 08:40:18 +0000 (08:40 +0000)]
bonding: allow user-controlled output slave selection

v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.

This patch give the user the ability to control the output slave for
round-robin and active-backup bonding.  Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode.  Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.

Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag.  This will allow reception of unicast traffic on
any of the backup interfaces.

This was tested with IPv4-based filters as well as VLAN-based filters
with good results.

More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: add all_slaves_active parameter
Andy Gospodarek [Wed, 2 Jun 2010 08:39:21 +0000 (08:39 +0000)]
bonding: add all_slaves_active parameter

v2: changed parameter name from 'keep_all' to 'all_slaves_active' and
skipped setting slaves to inactive rather than creating a new flag at
Jay's suggestion.

In an effort to suppress duplicate frames on certain bonding modes
(specifically the modes that do not require additional configuration on
the switch or switches connected to the host), code was added in the
generic receive patch in 2.6.16.  The current behavior works quite well
for most users, but there are some times it would be nice to restore old
functionality and allow all frames to make their way up the stack.

This patch adds support for a new module option and sysfs file called
'all_slaves_active' that will restore pre-2.6.16 functionality if the
user desires.  The default value is '0' and retains existing behavior,
but the user can set it to '1' and allow all frames up if desired.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoskbuff: add check for non-linear to warn_if_lro and needs_linearize
Alexander Duyck [Wed, 2 Jun 2010 12:24:37 +0000 (12:24 +0000)]
skbuff: add check for non-linear to warn_if_lro and needs_linearize

We can avoid an unecessary cache miss by checking if the skb is non-linear
before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be
done to avoid a cache miss on nr_frags if data_len is 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofix return value of __pppoe_xmit() method.
Rami Rosen [Thu, 3 Jun 2010 05:02:29 +0000 (05:02 +0000)]
fix return value of __pppoe_xmit() method.

Hi,
 __pppoe_xmit() in drivers/net/pppoe always returns 1.
When the methods fails (via goto abort), it should return 0 and not 1.

Regards,
Rami Rosen

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosyncookies: update mss tables
Florian Westphal [Thu, 3 Jun 2010 00:43:57 +0000 (00:43 +0000)]
syncookies: update mss tables

- ipv6 msstab: account for ipv6 header size
- ipv4 msstab: add mss for Jumbograms.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosyncookies: avoid unneeded tcp header flag double check
Florian Westphal [Thu, 3 Jun 2010 00:43:44 +0000 (00:43 +0000)]
syncookies: avoid unneeded tcp header flag double check

caller: if (!th->rst && !th->syn && th->ack)
callee: if (!th->ack)

make the caller only check for !syn (common for 3whs), and move
the !rst / ack test to the callee.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosyncookies: make v4/v6 synflood warning behaviour the same
Florian Westphal [Thu, 3 Jun 2010 00:43:12 +0000 (00:43 +0000)]
syncookies: make v4/v6 synflood warning behaviour the same

both syn_flood_warning functions print a message, but
ipv4 version only prints a warning if CONFIG_SYN_COOKIES=y.

Make the v4 one behave like the v6 one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoX25: remove duplicated #include
Huang Weiyi [Fri, 4 Jun 2010 23:14:15 +0000 (16:14 -0700)]
X25: remove duplicated #include

Remove duplicated #include('s) in drivers/net/wan/x25_asy.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: use correct net ns in cookie_v4_check()
Eric Dumazet [Thu, 3 Jun 2010 05:45:47 +0000 (05:45 +0000)]
tcp: use correct net ns in cookie_v4_check()

Its better to make a route lookup in appropriate namespace.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agorps: tcp: fix rps_sock_flow_table table updates
Eric Dumazet [Thu, 3 Jun 2010 09:03:58 +0000 (09:03 +0000)]
rps: tcp: fix rps_sock_flow_table table updates

I believe a moderate SYN flood attack can corrupt RFS flow table
(rps_sock_flow_table), making RPS/RFS much less effective.

Even in a normal situation, server handling short lived sessions suffer
from bad steering for the first data packet of a session, if another SYN
packet is received for another session.

We do following action in tcp_v4_rcv() :

sock_rps_save_rxhash(sk, skb->rxhash);

We should _not_ do this if sk is a LISTEN socket, as about each
packet received on a LISTEN socket has a different rxhash than
previous one.
 -> RPS_NO_CPU markers are spread all over rps_sock_flow_table.

Also, it makes sense to protect sk->rxhash field changes with socket
lock (We currently can change it even if user thread owns the lock
and might use rxhash)

This patch moves sock_rps_save_rxhash() to a sock locked section,
and only for non LISTEN sockets.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoppp_generic: fix multilink fragment sizes
Ben McKeegan [Wed, 2 Jun 2010 23:14:33 +0000 (23:14 +0000)]
ppp_generic: fix multilink fragment sizes

Fix bug in multilink fragment size calculation introduced by
commit 9c705260feea6ae329bc6b6d5f6d2ef0227eda0a
"ppp: ppp_mp_explode() redesign"

Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosyncookies: remove Kconfig text line about disabled-by-default
Florian Westphal [Thu, 3 Jun 2010 00:42:30 +0000 (00:42 +0000)]
syncookies: remove Kconfig text line about disabled-by-default

syncookies default to on since
e994b7c901ded7200b525a707c6da71f2cf6d4bb
(tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: only check pfc bits in hang logic if pfc is enabled
John Fastabend [Thu, 3 Jun 2010 17:03:45 +0000 (17:03 +0000)]
ixgbe: only check pfc bits in hang logic if pfc is enabled

Only check pfc bits in hang logic if PFC is enabled.  Previously,
if DCB was enabled but PFC was disabled the incorrect pause
bits would be checked.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: check for refcount if pop a stacked dst_entry
Steffen Klassert [Fri, 4 Jun 2010 01:57:38 +0000 (01:57 +0000)]
net: check for refcount if pop a stacked dst_entry

xfrm triggers a warning if dst_pop() drops a refcount
on a noref dst. This patch changes dst_pop() to
skb_dst_pop(). skb_dst_pop() drops the refcnt only
on a refcounted dst. Also we don't clone the child
dst_entry, so it is not refcounted and we can use
skb_dst_set_noref() in xfrm_output_one().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoqlcnic: Fix Compilation Issue when CONFIG_INET was not set
Anirban Chakraborty [Thu, 3 Jun 2010 07:50:56 +0000 (07:50 +0000)]
qlcnic: Fix Compilation Issue when CONFIG_INET was not set

Original code was placed incorrectly inside a block of code marked
with CONFIG_INET directive. Fix by moving it outside.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoFrom abbffa2aa9bd6f8df16d0d0a102af677510d8b9a Mon Sep 17 00:00:00 2001
Eric Dumazet [Fri, 4 Jun 2010 03:03:40 +0000 (20:03 -0700)]
From abbffa2aa9bd6f8df16d0d0a102af677510d8b9a Mon Sep 17 00:00:00 2001
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 3 Jun 2010 04:29:41 +0000
Subject: [PATCH 2/3] net: net/socket.c and net/compat.c cleanups

cleanup patch, to match modern coding style.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/compat.c |   47 ++++++++---------
 net/socket.c |  165 ++++++++++++++++++++++++++++------------------------------
 2 files changed, 102 insertions(+), 110 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 1cf7590..63d260e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -81,7 +81,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
  int tot_len;

  if (kern_msg->msg_namelen) {
- if (mode==VERIFY_READ) {
+ if (mode == VERIFY_READ) {
  int err = move_addr_to_kernel(kern_msg->msg_name,
        kern_msg->msg_namelen,
        kern_address);
@@ -354,7 +354,7 @@ static int do_set_attach_filter(struct socket *sock, int level, int optname,
 static int do_set_sock_timeout(struct socket *sock, int level,
  int optname, char __user *optval, unsigned int optlen)
 {
- struct compat_timeval __user *up = (struct compat_timeval __user *) optval;
+ struct compat_timeval __user *up = (struct compat_timeval __user *)optval;
  struct timeval ktime;
  mm_segment_t old_fs;
  int err;
@@ -367,7 +367,7 @@ static int do_set_sock_timeout(struct socket *sock, int level,
  return -EFAULT;
  old_fs = get_fs();
  set_fs(KERNEL_DS);
- err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime));
+ err = sock_setsockopt(sock, level, optname, (char *)&ktime, sizeof(ktime));
  set_fs(old_fs);

  return err;
@@ -389,11 +389,10 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
  char __user *optval, unsigned int optlen)
 {
  int err;
- struct socket *sock;
+ struct socket *sock = sockfd_lookup(fd, &err);

- if ((sock = sockfd_lookup(fd, &err))!=NULL)
- {
- err = security_socket_setsockopt(sock,level,optname);
+ if (sock) {
+ err = security_socket_setsockopt(sock, level, optname);
  if (err) {
  sockfd_put(sock);
  return err;
@@ -453,7 +452,7 @@ static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
 int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
  struct compat_timeval __user *ctv =
- (struct compat_timeval __user*) userstamp;
+ (struct compat_timeval __user *) userstamp;
  int err = -ENOENT;
  struct timeval tv;

@@ -477,7 +476,7 @@ EXPORT_SYMBOL(compat_sock_get_timestamp);
 int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
 {
  struct compat_timespec __user *ctv =
- (struct compat_timespec __user*) userstamp;
+ (struct compat_timespec __user *) userstamp;
  int err = -ENOENT;
  struct timespec ts;

@@ -502,12 +501,10 @@ asmlinkage long compat_sys_getsockopt(int fd, int level, int optname,
  char __user *optval, int __user *optlen)
 {
  int err;
- struct socket *sock;
+ struct socket *sock = sockfd_lookup(fd, &err);

- if ((sock = sockfd_lookup(fd, &err))!=NULL)
- {
- err = security_socket_getsockopt(sock, level,
-    optname);
+ if (sock) {
+ err = security_socket_getsockopt(sock, level, optname);
  if (err) {
  sockfd_put(sock);
  return err;
@@ -557,7 +554,7 @@ struct compat_group_filter {

 int compat_mc_setsockopt(struct sock *sock, int level, int optname,
  char __user *optval, unsigned int optlen,
- int (*setsockopt)(struct sock *,int,int,char __user *,unsigned int))
+ int (*setsockopt)(struct sock *, int, int, char __user *, unsigned int))
 {
  char __user *koptval = optval;
  int koptlen = optlen;
@@ -640,12 +637,11 @@ int compat_mc_setsockopt(struct sock *sock, int level, int optname,
  }
  return setsockopt(sock, level, optname, koptval, koptlen);
 }
-
 EXPORT_SYMBOL(compat_mc_setsockopt);

 int compat_mc_getsockopt(struct sock *sock, int level, int optname,
  char __user *optval, int __user *optlen,
- int (*getsockopt)(struct sock *,int,int,char __user *,int __user *))
+ int (*getsockopt)(struct sock *, int, int, char __user *, int __user *))
 {
  struct compat_group_filter __user *gf32 = (void *)optval;
  struct group_filter __user *kgf;
@@ -681,7 +677,7 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
      __put_user(interface, &kgf->gf_interface) ||
      __put_user(fmode, &kgf->gf_fmode) ||
      __put_user(numsrc, &kgf->gf_numsrc) ||
-     copy_in_user(&kgf->gf_group,&gf32->gf_group,sizeof(kgf->gf_group)))
+     copy_in_user(&kgf->gf_group, &gf32->gf_group, sizeof(kgf->gf_group)))
  return -EFAULT;

  err = getsockopt(sock, level, optname, (char __user *)kgf, koptlen);
@@ -714,21 +710,22 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
  copylen = numsrc * sizeof(gf32->gf_slist[0]);
  if (copylen > klen)
  copylen = klen;
-         if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
+ if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
  return -EFAULT;
  }
  return err;
 }
-
 EXPORT_SYMBOL(compat_mc_getsockopt);

 /* Argument list sizes for compat_sys_socketcall */
 #define AL(x) ((x) * sizeof(u32))
-static unsigned char nas[20]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
- AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
- AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
- AL(4),AL(5)};
+static unsigned char nas[20] = {
+ AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
+ AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
+ AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
+ AL(4), AL(5)
+};
 #undef AL

 asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg, unsigned flags)
@@ -827,7 +824,7 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args)
    compat_ptr(a[4]), compat_ptr(a[5]));
  break;
  case SYS_SHUTDOWN:
- ret = sys_shutdown(a0,a1);
+ ret = sys_shutdown(a0, a1);
  break;
  case SYS_SETSOCKOPT:
  ret = compat_sys_setsockopt(a0, a1, a[2],
diff --git a/net/socket.c b/net/socket.c
index 367d547..b63c051 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -124,7 +124,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
 static ssize_t sock_sendpage(struct file *file, struct page *page,
       int offset, size_t size, loff_t *ppos, int more);
 static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
-         struct pipe_inode_info *pipe, size_t len,
+ struct pipe_inode_info *pipe, size_t len,
  unsigned int flags);

 /*
@@ -162,7 +162,7 @@ static const struct net_proto_family *net_families[NPROTO] __read_mostly;
  * Statistics counters of the socket lists
  */

-static DEFINE_PER_CPU(int, sockets_in_use) = 0;
+static DEFINE_PER_CPU(int, sockets_in_use);

 /*
  * Support routines.
@@ -309,9 +309,9 @@ static int init_inodecache(void)
 }

 static const struct super_operations sockfs_ops = {
- .alloc_inode = sock_alloc_inode,
- .destroy_inode =sock_destroy_inode,
- .statfs = simple_statfs,
+ .alloc_inode = sock_alloc_inode,
+ .destroy_inode = sock_destroy_inode,
+ .statfs = simple_statfs,
 };

 static int sockfs_get_sb(struct file_system_type *fs_type,
@@ -411,6 +411,7 @@ int sock_map_fd(struct socket *sock, int flags)

  return fd;
 }
+EXPORT_SYMBOL(sock_map_fd);

 static struct socket *sock_from_file(struct file *file, int *err)
 {
@@ -422,7 +423,7 @@ static struct socket *sock_from_file(struct file *file, int *err)
 }

 /**
- * sockfd_lookup -  Go from a file number to its socket slot
+ * sockfd_lookup - Go from a file number to its socket slot
  * @fd: file handle
  * @err: pointer to an error code return
  *
@@ -450,6 +451,7 @@ struct socket *sockfd_lookup(int fd, int *err)
  fput(file);
  return sock;
 }
+EXPORT_SYMBOL(sockfd_lookup);

 static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
 {
@@ -540,6 +542,7 @@ void sock_release(struct socket *sock)
  }
  sock->file = NULL;
 }
+EXPORT_SYMBOL(sock_release);

 int sock_tx_timestamp(struct msghdr *msg, struct sock *sk,
        union skb_shared_tx *shtx)
@@ -586,6 +589,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
  ret = wait_on_sync_kiocb(&iocb);
  return ret;
 }
+EXPORT_SYMBOL(sock_sendmsg);

 int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
     struct kvec *vec, size_t num, size_t size)
@@ -604,6 +608,7 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
  set_fs(oldfs);
  return result;
 }
+EXPORT_SYMBOL(kernel_sendmsg);

 static int ktime2ts(ktime_t kt, struct timespec *ts)
 {
@@ -664,7 +669,6 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
  put_cmsg(msg, SOL_SOCKET,
   SCM_TIMESTAMPING, sizeof(ts), &ts);
 }
-
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);

 inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
@@ -720,6 +724,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
  ret = wait_on_sync_kiocb(&iocb);
  return ret;
 }
+EXPORT_SYMBOL(sock_recvmsg);

 static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
        size_t size, int flags)
@@ -752,6 +757,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
  set_fs(oldfs);
  return result;
 }
+EXPORT_SYMBOL(kernel_recvmsg);

 static void sock_aio_dtor(struct kiocb *iocb)
 {
@@ -774,7 +780,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,
 }

 static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
-         struct pipe_inode_info *pipe, size_t len,
+ struct pipe_inode_info *pipe, size_t len,
  unsigned int flags)
 {
  struct socket *sock = file->private_data;
@@ -887,7 +893,7 @@ static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
  */

 static DEFINE_MUTEX(br_ioctl_mutex);
-static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg) = NULL;
+static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg);

 void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
 {
@@ -895,7 +901,6 @@ void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
  br_ioctl_hook = hook;
  mutex_unlock(&br_ioctl_mutex);
 }
-
 EXPORT_SYMBOL(brioctl_set);

 static DEFINE_MUTEX(vlan_ioctl_mutex);
@@ -907,7 +912,6 @@ void vlan_ioctl_set(int (*hook) (struct net *, void __user *))
  vlan_ioctl_hook = hook;
  mutex_unlock(&vlan_ioctl_mutex);
 }
-
 EXPORT_SYMBOL(vlan_ioctl_set);

 static DEFINE_MUTEX(dlci_ioctl_mutex);
@@ -919,7 +923,6 @@ void dlci_ioctl_set(int (*hook) (unsigned int, void __user *))
  dlci_ioctl_hook = hook;
  mutex_unlock(&dlci_ioctl_mutex);
 }
-
 EXPORT_SYMBOL(dlci_ioctl_set);

 static long sock_do_ioctl(struct net *net, struct socket *sock,
@@ -1047,6 +1050,7 @@ out_release:
  sock = NULL;
  goto out;
 }
+EXPORT_SYMBOL(sock_create_lite);

 /* No kernel lock held - perfect */
 static unsigned int sock_poll(struct file *file, poll_table *wait)
@@ -1147,6 +1151,7 @@ call_kill:
  rcu_read_unlock();
  return 0;
 }
+EXPORT_SYMBOL(sock_wake_async);

 static int __sock_create(struct net *net, int family, int type, int protocol,
   struct socket **res, int kern)
@@ -1265,11 +1270,13 @@ int sock_create(int family, int type, int protocol, struct socket **res)
 {
  return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
 }
+EXPORT_SYMBOL(sock_create);

 int sock_create_kern(int family, int type, int protocol, struct socket **res)
 {
  return __sock_create(&init_net, family, type, protocol, res, 1);
 }
+EXPORT_SYMBOL(sock_create_kern);

 SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
 {
@@ -1474,7 +1481,8 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
  goto out;

  err = -ENFILE;
- if (!(newsock = sock_alloc()))
+ newsock = sock_alloc();
+ if (!newsock)
  goto out_put;

  newsock->type = sock->type;
@@ -1861,8 +1869,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct msghdr __user *, msg, unsigned, flags)
  if (MSG_CMSG_COMPAT & flags) {
  if (get_compat_msghdr(&msg_sys, msg_compat))
  return -EFAULT;
- }
- else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
+ } else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
  return -EFAULT;

  sock = sockfd_lookup_light(fd, &err, &fput_needed);
@@ -1964,8 +1971,7 @@ static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
  if (MSG_CMSG_COMPAT & flags) {
  if (get_compat_msghdr(msg_sys, msg_compat))
  return -EFAULT;
- }
- else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
+ } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
  return -EFAULT;

  err = -EMSGSIZE;
@@ -2191,10 +2197,10 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
 /* Argument list sizes for sys_socketcall */
 #define AL(x) ((x) * sizeof(unsigned long))
 static const unsigned char nargs[20] = {
- AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
- AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
- AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
- AL(4),AL(5)
+ AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
+ AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
+ AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
+ AL(4), AL(5)
 };

 #undef AL
@@ -2340,6 +2346,7 @@ int sock_register(const struct net_proto_family *ops)
  printk(KERN_INFO "NET: Registered protocol family %d\n", ops->family);
  return err;
 }
+EXPORT_SYMBOL(sock_register);

 /**
  * sock_unregister - remove a protocol handler
@@ -2366,6 +2373,7 @@ void sock_unregister(int family)

  printk(KERN_INFO "NET: Unregistered protocol family %d\n", family);
 }
+EXPORT_SYMBOL(sock_unregister);

 static int __init sock_init(void)
 {
@@ -2490,13 +2498,13 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
  ifc.ifc_req = NULL;
  uifc = compat_alloc_user_space(sizeof(struct ifconf));
  } else {
- size_t len =((ifc32.ifc_len / sizeof (struct compat_ifreq)) + 1) *
- sizeof (struct ifreq);
+ size_t len = ((ifc32.ifc_len / sizeof(struct compat_ifreq)) + 1) *
+ sizeof(struct ifreq);
  uifc = compat_alloc_user_space(sizeof(struct ifconf) + len);
  ifc.ifc_len = len;
  ifr = ifc.ifc_req = (void __user *)(uifc + 1);
  ifr32 = compat_ptr(ifc32.ifcbuf);
- for (i = 0; i < ifc32.ifc_len; i += sizeof (struct compat_ifreq)) {
+ for (i = 0; i < ifc32.ifc_len; i += sizeof(struct compat_ifreq)) {
  if (copy_in_user(ifr, ifr32, sizeof(struct compat_ifreq)))
  return -EFAULT;
  ifr++;
@@ -2516,9 +2524,9 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
  ifr = ifc.ifc_req;
  ifr32 = compat_ptr(ifc32.ifcbuf);
  for (i = 0, j = 0;
-             i + sizeof (struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len;
-      i += sizeof (struct compat_ifreq), j += sizeof (struct ifreq)) {
- if (copy_in_user(ifr32, ifr, sizeof (struct compat_ifreq)))
+      i + sizeof(struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len;
+      i += sizeof(struct compat_ifreq), j += sizeof(struct ifreq)) {
+ if (copy_in_user(ifr32, ifr, sizeof(struct compat_ifreq)))
  return -EFAULT;
  ifr32++;
  ifr++;
@@ -2567,7 +2575,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32
  compat_uptr_t uptr32;
  struct ifreq __user *uifr;

- uifr = compat_alloc_user_space(sizeof (*uifr));
+ uifr = compat_alloc_user_space(sizeof(*uifr));
  if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
  return -EFAULT;

@@ -2601,9 +2609,9 @@ static int bond_ioctl(struct net *net, unsigned int cmd,
  return -EFAULT;

  old_fs = get_fs();
- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
  err = dev_ioctl(net, cmd, &kifr);
- set_fs (old_fs);
+ set_fs(old_fs);

  return err;
  case SIOCBONDSLAVEINFOQUERY:
@@ -2710,9 +2718,9 @@ static int compat_sioc_ifmap(struct net *net, unsigned int cmd,
  return -EFAULT;

  old_fs = get_fs();
- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
  err = dev_ioctl(net, cmd, (void __user *)&ifr);
- set_fs (old_fs);
+ set_fs(old_fs);

  if (cmd == SIOCGIFMAP && !err) {
  err = copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name));
@@ -2734,7 +2742,7 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
  compat_uptr_t uptr32;
  struct ifreq __user *uifr;

- uifr = compat_alloc_user_space(sizeof (*uifr));
+ uifr = compat_alloc_user_space(sizeof(*uifr));
  if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
  return -EFAULT;

@@ -2750,20 +2758,20 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
 }

 struct rtentry32 {
- u32    rt_pad1;
+ u32 rt_pad1;
  struct sockaddr rt_dst;         /* target address               */
  struct sockaddr rt_gateway;     /* gateway addr (RTF_GATEWAY)   */
  struct sockaddr rt_genmask;     /* target network mask (IP)     */
- unsigned short  rt_flags;
- short           rt_pad2;
- u32    rt_pad3;
- unsigned char   rt_tos;
- unsigned char   rt_class;
- short           rt_pad4;
- short           rt_metric;      /* +1 for binary compatibility! */
+ unsigned short rt_flags;
+ short rt_pad2;
+ u32 rt_pad3;
+ unsigned char rt_tos;
+ unsigned char rt_class;
+ short rt_pad4;
+ short rt_metric;      /* +1 for binary compatibility! */
  /* char * */ u32 rt_dev;        /* forcing the device at add    */
- u32    rt_mtu;         /* per route MTU/Window         */
- u32    rt_window;      /* Window clamping              */
+ u32 rt_mtu;         /* per route MTU/Window         */
+ u32 rt_window;      /* Window clamping              */
  unsigned short  rt_irtt;        /* Initial RTT                  */
 };

@@ -2793,29 +2801,29 @@ static int routing_ioctl(struct net *net, struct socket *sock,

  if (sock && sock->sk && sock->sk->sk_family == AF_INET6) { /* ipv6 */
  struct in6_rtmsg32 __user *ur6 = argp;
- ret = copy_from_user (&r6.rtmsg_dst, &(ur6->rtmsg_dst),
+ ret = copy_from_user(&r6.rtmsg_dst, &(ur6->rtmsg_dst),
  3 * sizeof(struct in6_addr));
- ret |= __get_user (r6.rtmsg_type, &(ur6->rtmsg_type));
- ret |= __get_user (r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
- ret |= __get_user (r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
- ret |= __get_user (r6.rtmsg_metric, &(ur6->rtmsg_metric));
- ret |= __get_user (r6.rtmsg_info, &(ur6->rtmsg_info));
- ret |= __get_user (r6.rtmsg_flags, &(ur6->rtmsg_flags));
- ret |= __get_user (r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));
+ ret |= __get_user(r6.rtmsg_type, &(ur6->rtmsg_type));
+ ret |= __get_user(r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
+ ret |= __get_user(r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
+ ret |= __get_user(r6.rtmsg_metric, &(ur6->rtmsg_metric));
+ ret |= __get_user(r6.rtmsg_info, &(ur6->rtmsg_info));
+ ret |= __get_user(r6.rtmsg_flags, &(ur6->rtmsg_flags));
+ ret |= __get_user(r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));

  r = (void *) &r6;
  } else { /* ipv4 */
  struct rtentry32 __user *ur4 = argp;
- ret = copy_from_user (&r4.rt_dst, &(ur4->rt_dst),
+ ret = copy_from_user(&r4.rt_dst, &(ur4->rt_dst),
  3 * sizeof(struct sockaddr));
- ret |= __get_user (r4.rt_flags, &(ur4->rt_flags));
- ret |= __get_user (r4.rt_metric, &(ur4->rt_metric));
- ret |= __get_user (r4.rt_mtu, &(ur4->rt_mtu));
- ret |= __get_user (r4.rt_window, &(ur4->rt_window));
- ret |= __get_user (r4.rt_irtt, &(ur4->rt_irtt));
- ret |= __get_user (rtdev, &(ur4->rt_dev));
+ ret |= __get_user(r4.rt_flags, &(ur4->rt_flags));
+ ret |= __get_user(r4.rt_metric, &(ur4->rt_metric));
+ ret |= __get_user(r4.rt_mtu, &(ur4->rt_mtu));
+ ret |= __get_user(r4.rt_window, &(ur4->rt_window));
+ ret |= __get_user(r4.rt_irtt, &(ur4->rt_irtt));
+ ret |= __get_user(rtdev, &(ur4->rt_dev));
  if (rtdev) {
- ret |= copy_from_user (devname, compat_ptr(rtdev), 15);
+ ret |= copy_from_user(devname, compat_ptr(rtdev), 15);
  r4.rt_dev = devname; devname[15] = 0;
  } else
  r4.rt_dev = NULL;
@@ -2828,9 +2836,9 @@ static int routing_ioctl(struct net *net, struct socket *sock,
  goto out;
  }

- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
  ret = sock_do_ioctl(net, sock, cmd, (unsigned long) r);
- set_fs (old_fs);
+ set_fs(old_fs);

 out:
  return ret;
@@ -2993,11 +3001,13 @@ int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
 {
  return sock->ops->bind(sock, addr, addrlen);
 }
+EXPORT_SYMBOL(kernel_bind);

 int kernel_listen(struct socket *sock, int backlog)
 {
  return sock->ops->listen(sock, backlog);
 }
+EXPORT_SYMBOL(kernel_listen);

 int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
 {
@@ -3022,24 +3032,28 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
 done:
  return err;
 }
+EXPORT_SYMBOL(kernel_accept);

 int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen,
     int flags)
 {
  return sock->ops->connect(sock, addr, addrlen, flags);
 }
+EXPORT_SYMBOL(kernel_connect);

 int kernel_getsockname(struct socket *sock, struct sockaddr *addr,
   int *addrlen)
 {
  return sock->ops->getname(sock, addr, addrlen, 0);
 }
+EXPORT_SYMBOL(kernel_getsockname);

 int kernel_getpeername(struct socket *sock, struct sockaddr *addr,
   int *addrlen)
 {
  return sock->ops->getname(sock, addr, addrlen, 1);
 }
+EXPORT_SYMBOL(kernel_getpeername);

 int kernel_getsockopt(struct socket *sock, int level, int optname,
  char *optval, int *optlen)
@@ -3056,6 +3070,7 @@ int kernel_getsockopt(struct socket *sock, int level, int optname,
  set_fs(oldfs);
  return err;
 }
+EXPORT_SYMBOL(kernel_getsockopt);

 int kernel_setsockopt(struct socket *sock, int level, int optname,
  char *optval, unsigned int optlen)
@@ -3072,6 +3087,7 @@ int kernel_setsockopt(struct socket *sock, int level, int optname,
  set_fs(oldfs);
  return err;
 }
+EXPORT_SYMBOL(kernel_setsockopt);

 int kernel_sendpage(struct socket *sock, struct page *page, int offset,
      size_t size, int flags)
@@ -3083,6 +3099,7 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset,

  return sock_no_sendpage(sock, page, offset, size, flags);
 }
+EXPORT_SYMBOL(kernel_sendpage);

 int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)
 {
@@ -3095,33 +3112,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)

  return err;
 }
+EXPORT_SYMBOL(kernel_sock_ioctl);

 int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how)
 {
  return sock->ops->shutdown(sock, how);
 }
-
-EXPORT_SYMBOL(sock_create);
-EXPORT_SYMBOL(sock_create_kern);
-EXPORT_SYMBOL(sock_create_lite);
-EXPORT_SYMBOL(sock_map_fd);
-EXPORT_SYMBOL(sock_recvmsg);
-EXPORT_SYMBOL(sock_register);
-EXPORT_SYMBOL(sock_release);
-EXPORT_SYMBOL(sock_sendmsg);
-EXPORT_SYMBOL(sock_unregister);
-EXPORT_SYMBOL(sock_wake_async);
-EXPORT_SYMBOL(sockfd_lookup);
-EXPORT_SYMBOL(kernel_sendmsg);
-EXPORT_SYMBOL(kernel_recvmsg);
-EXPORT_SYMBOL(kernel_bind);
-EXPORT_SYMBOL(kernel_listen);
-EXPORT_SYMBOL(kernel_accept);
-EXPORT_SYMBOL(kernel_connect);
-EXPORT_SYMBOL(kernel_getsockname);
-EXPORT_SYMBOL(kernel_getpeername);
-EXPORT_SYMBOL(kernel_getsockopt);
-EXPORT_SYMBOL(kernel_setsockopt);
-EXPORT_SYMBOL(kernel_sendpage);
-EXPORT_SYMBOL(kernel_sock_ioctl);
 EXPORT_SYMBOL(kernel_sock_shutdown);
+
--
1.7.0.4

14 years agoixgbe: Use netdev_<level>, dev_<level>, pr_<level>
Emil Tantilov [Thu, 3 Jun 2010 16:53:41 +0000 (16:53 +0000)]
ixgbe: Use netdev_<level>, dev_<level>, pr_<level>

This patch is alternative to a previous patch submitted by Joe Perches.

Create common macros e_<level> and e_dev_<level> that use netdev_<level> and
dev_<level> similar to e1000e.
Redefined pr_fmt for driver messages.
Use %pM to display MAC address.
Aligned text to better match the new format.

CC: Joe Perches <joe@perches.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoarp: RCU changes
Eric Dumazet [Thu, 3 Jun 2010 04:09:10 +0000 (04:09 +0000)]
arp: RCU changes

Avoid two atomic ops in arp_fwd_proxy()

Avoid two atomic ops in arp_process()

Valid optims since arp_rcv() is run under rcu_read_lock()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: RCU changes in __mkroute_input()
Eric Dumazet [Thu, 3 Jun 2010 04:13:21 +0000 (04:13 +0000)]
ipv4: RCU changes in __mkroute_input()

Avoid two atomic ops on output device refcount

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Thu, 3 Jun 2010 19:30:58 +0000 (12:30 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

14 years agoixgbe: return IXGBE_ERR_RAR_INDEX when out of range
Jeff Kirsher [Wed, 2 Jun 2010 12:44:05 +0000 (12:44 +0000)]
ixgbe: return IXGBE_ERR_RAR_INDEX when out of range

Based on original patch from Shirley Ma <xma@us.ibm.com>
Return IXGBE_ERR_RAR_INDEX when RAR index is out of range, instead of
returning IXGBE_SUCCESS.

CC: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoact_pedit: access skb->data safely
Changli Gao [Wed, 2 Jun 2010 04:55:02 +0000 (04:55 +0000)]
act_pedit: access skb->data safely

access skb->data safely

we should use skb_header_pointer() and skb_store_bits() to access skb->data to
handle small or non-linear skbs.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/act_pedit.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Store port number in net_device::dev_id
Ben Hutchings [Wed, 2 Jun 2010 10:39:56 +0000 (10:39 +0000)]
sfc: Store port number in net_device::dev_id

This exposes the port number to userland through sysfs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoepic100: Test __BIG_ENDIAN instead of (non-existent) CONFIG_BIG_ENDIAN
Roland Dreier [Wed, 2 Jun 2010 10:36:53 +0000 (10:36 +0000)]
epic100: Test __BIG_ENDIAN instead of (non-existent) CONFIG_BIG_ENDIAN

Probably no one has used this driver on big-endian systems, since it was
setting up descriptor swapping if CONFIG_BIG_ENDIAN is set, which it
never is, since that symbol is not mentioned anywhere else in the kernel
source.  Switch this test to a check for __BIG_ENDIAN so it has a chance
at working.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotehuti: return -EFAULT on copy_to_user errors
Dan Carpenter [Thu, 3 Jun 2010 00:05:35 +0000 (00:05 +0000)]
tehuti: return -EFAULT on copy_to_user errors

copy_to_user() returns the number of bytes remaining but we want to
return a negative error code here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoisdn/kcapi: return -EFAULT on copy_from_user errors
Dan Carpenter [Wed, 2 Jun 2010 23:56:13 +0000 (23:56 +0000)]
isdn/kcapi: return -EFAULT on copy_from_user errors

copy_from_user() returns the number of bytes remaining but we should
return -EFAULT here.  The error code gets returned to the user.  Both
old_capi_manufacturer() and capi20_manufacturer() had other places
that already returned -EFAULT so this won't break anything.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: change logical negate to bitwise
Dan Carpenter [Wed, 2 Jun 2010 13:43:15 +0000 (13:43 +0000)]
e1000e: change logical negate to bitwise

The bitwise negate is intended here.  With the logical negate the
condition is always false.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Get port number from CS_PORT_NUM, not PCI function number
Ben Hutchings [Tue, 1 Jun 2010 11:32:43 +0000 (11:32 +0000)]
sfc: Get port number from CS_PORT_NUM, not PCI function number

A single shared memory region used to communicate with firmware is
mapped into both PCI PFs of the SFC9020 and SFL9021.  Drivers must be
able to identify which port they are addressing in order to use the
correct sub-region.  Currently we use the PCI function number, but the
PCI address may be virtualised.  Use the CS_PORT_NUM register field
defined for just this purpose.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: use __packed annotation
Eric Dumazet [Thu, 3 Jun 2010 10:21:52 +0000 (03:21 -0700)]
net: use __packed annotation

cleanup patch.

Use new __packed annotation in net/ and include/
(except netfilter)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net: use __packed annotation
Eric Dumazet [Wed, 2 Jun 2010 18:10:09 +0000 (18:10 +0000)]
drivers/net: use __packed annotation

cleanup patch.

Use new __packed annotation in drivers/net/

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofec: Cleanup PHY probing
Denis Kirjanov [Wed, 2 Jun 2010 09:17:00 +0000 (09:17 +0000)]
fec: Cleanup PHY probing

Cleanup PHY probing: use helpers from phylib

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofec: convert TX hook to netdev_tx_t
Denis Kirjanov [Wed, 2 Jun 2010 09:15:47 +0000 (09:15 +0000)]
fec: convert TX hook to netdev_tx_t

Convert TX hook return value to netdev_tx_t

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: mac8390 - Sort out memory/MMIO accesses and casts
Geert Uytterhoeven [Wed, 2 Jun 2010 07:36:20 +0000 (07:36 +0000)]
net: mac8390 - Sort out memory/MMIO accesses and casts

commit 5c7fffd0e3b57cb63f50bbd710868f012d67654f ("drivers/net/mac8390.c: Remove
useless memcpy casting") removed too many casts, introducing the following
warnings:

| drivers/net/mac8390.c:248: warning: passing argument 1 of '__builtin_memcpy' makes pointer from integer without a cast
| drivers/net/mac8390.c:253: warning: passing argument 1 of 'word_memcpy_tocard' makes pointer from integer without a cast
| drivers/net/mac8390.c:255: warning: passing argument 2 of 'word_memcpy_fromcard' makes pointer from integer without a cast

Instead of just readding the casts,
  - move all casts inside word_memcpy_{to,from}card(),
  - replace an incorrect memcpy() by memcpy_toio(),
  - add memcmp_withio() as a wrapper around memcmp(),
  - replace an incorrect memcpy_toio() by memcpy_fromio().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agochelsio: Remove remnants of CONFIG_CHELSIO_T1_COUGAR
Roland Dreier [Wed, 2 Jun 2010 08:04:28 +0000 (08:04 +0000)]
chelsio: Remove remnants of CONFIG_CHELSIO_T1_COUGAR

CONFIG_CHELSIO_T1_COUGAR cannot be set (it appears nowhere in any
Kconfig files), and the code it protects could never build (cspi.h was
never added to the kernel tree).  Therefore it's pretty safe to remove
all vestiges of this dead code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: RCU conversion of ip_route_input_slow/ip_route_input_mc
Eric Dumazet [Wed, 2 Jun 2010 19:21:31 +0000 (19:21 +0000)]
ipv4: RCU conversion of ip_route_input_slow/ip_route_input_mc

Avoid two atomic ops on struct in_device refcount per incoming packet,
if slow path taken, (or route cache disabled)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: add LINUX_MIB_IPRPFILTER snmp counter
Eric Dumazet [Wed, 2 Jun 2010 12:05:27 +0000 (12:05 +0000)]
ipv4: add LINUX_MIB_IPRPFILTER snmp counter

Christoph Lameter mentioned that packets could be dropped in input path
because of rp_filter settings, without any SNMP counter being
incremented. System administrator can have a hard time to track the
problem.

This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
each time we drop a packet because Reverse Path Filter triggers.

(We receive an IPv4 datagram on a given interface, and find the route to
send an answer would use another interface)

netstat -s | grep IPReversePathFilter
    IPReversePathFilter: 21714

Reported-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipconfig: document DHCP hostname and DNS record
Wu Fengguang [Wed, 2 Jun 2010 16:02:44 +0000 (16:02 +0000)]
ipconfig: document DHCP hostname and DNS record

Now it's possible to update the DNS record for $HOST_NAME with

ip=::::$HOST_NAME::dhcp

CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst...
David S. Miller [Wed, 2 Jun 2010 15:26:36 +0000 (08:26 -0700)]
Merge branch 'vhost-net-next' of git://git./linux/kernel/git/mst/vhost

14 years agocls_u32: use skb_header_pointer() to dereference data safely
Changli Gao [Wed, 2 Jun 2010 14:32:42 +0000 (07:32 -0700)]
cls_u32: use skb_header_pointer() to dereference data safely

use skb_header_pointer() to dereference data safely

the original skb->data dereference isn't safe, as there isn't any skb->len or
skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch.
And when the skb isn't long enough, we terminate the function u32_classify()
immediately with -1.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoTCP: tcp_hybla: Fix integer overflow in slow start increment
Daniele Lacamera [Wed, 2 Jun 2010 02:02:04 +0000 (02:02 +0000)]
TCP: tcp_hybla: Fix integer overflow in slow start increment

For large values of rtt, 2^rho operation may overflow u32. Clamp down the increment to 2^16.

Signed-off-by: Daniele Lacamera <root@danielinux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: replace hooks in __netif_receive_skb V5
Jiri Pirko [Tue, 1 Jun 2010 21:52:08 +0000 (21:52 +0000)]
net: replace hooks in __netif_receive_skb V5

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option
Arnaud Ebalard [Tue, 1 Jun 2010 21:35:01 +0000 (21:35 +0000)]
ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option

There are more than a dozen occurrences of following code in the
IPv6 stack:

    if (opt && opt->srcrt) {
            struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
            ipv6_addr_copy(&final, &fl.fl6_dst);
            ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
            final_p = &final;
    }

Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomac8390: raise error logging priority
Finn Thain [Wed, 2 Jun 2010 14:06:34 +0000 (07:06 -0700)]
mac8390: raise error logging priority

Log error conditions using KERN_ERR priority.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipconfig: send host-name in DHCP requests
Wu Fengguang [Sun, 30 May 2010 17:19:53 +0000 (17:19 +0000)]
ipconfig: send host-name in DHCP requests

Normally dhclient can be configured to send the "host-name" option
in DHCP requests to update the client's DNS record. However for an
NFSROOT system, dhclient shall never be called (which may change the
IP addr and therefore lose your root NFS mount connection).

So enable updating the DNS record with kernel parameter

ip=::::$HOST_NAME::dhcp

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoact_nat: fix the wrong checksum when addr isn't in old_addr/mask
Changli Gao [Sat, 29 May 2010 14:26:59 +0000 (14:26 +0000)]
act_nat: fix the wrong checksum when addr isn't in old_addr/mask

fix the wrong checksum when addr isn't in old_addr/mask

For TCP and UDP packets, when addr isn't in old_addr/mask we don't do SNAT or
DNAT, and we should not update layer 4 checksum.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/act_nat.c |    4 ++++
 1 file changed, 4 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/fec: fix pm to survive to suspend/resume
Eric Bénard [Wed, 2 Jun 2010 13:13:34 +0000 (06:13 -0700)]
net/fec: fix pm to survive to suspend/resume

* in the actual driver, calling fec_stop and fec_enet_init doesn't
allow to have a working network interface at resume (where a
ifconfig down and up is required to recover the interface)
* by using fec_enet_close and fec_enet_open, this patch solves this
problem and handle the case where the link changed between suspend
and resume
* this patch also disable clock at suspend and reenable it at resume

Signed-off-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agokorina: count RX DMA OVR as rx_fifo_error
Phil Sutter [Sat, 29 May 2010 13:23:36 +0000 (13:23 +0000)]
korina: count RX DMA OVR as rx_fifo_error

This way, RX DMA overruns (actually being caused by overrun of the
512byte input FIFO) show up in ifconfig output. The rx_fifo_errors
counter is unused otherwise.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agokorina: use netdev_alloc_skb_ip_align() here, too
Phil Sutter [Sat, 29 May 2010 13:23:35 +0000 (13:23 +0000)]
korina: use netdev_alloc_skb_ip_align() here, too

This patch completes commit 89d71a66c40d629e3b1285def543ab1425558cd5
which missed this spot, as it seems.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agokorina: fix deadlock on RX FIFO overrun
Phil Sutter [Sat, 29 May 2010 13:23:34 +0000 (13:23 +0000)]
korina: fix deadlock on RX FIFO overrun

By calling korina_restart(), the IRQ handler tries to disable the
interrupt it's currently serving. This leads to a deadlock since
disable_irq() waits for any running IRQ handlers to finish before
returning. This patch addresses the issue by turning korina_restart()
into a workqueue task, which is then scheduled when needed.

Reproducing the deadlock is easily done using e.g. GNU netcat to send
large amounts of UDP data to the host running this driver.

Note that the same problem (and fix) applies to TX FIFO underruns, but
apparently these are less easy to trigger.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agopacket_mmap: expose hw packet timestamps to network packet capture utilities
Scott McMillan [Wed, 2 Jun 2010 12:53:56 +0000 (05:53 -0700)]
packet_mmap: expose hw packet timestamps to network packet capture utilities

This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
timestamp source that is exported to capture utilities like tcpdump by
packet_mmap.

PACKET_TIMESTAMP accepts the same integer bit field as
SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE and
SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.

If PACKET_TIMESTAMP is not set, a software timestamp generated inside
the networking stack is used (the behavior before this setting was
added).

Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovxge: Fix checkstack warning in vxge_probe()
Prarit Bhargava [Wed, 2 Jun 2010 12:51:19 +0000 (05:51 -0700)]
vxge: Fix checkstack warning in vxge_probe()

Linux 2.6.33 reports this checkstack warning:

drivers/net/vxge/vxge-main.c: In function 'vxge_probe':
drivers/net/vxge/vxge-main.c:4409: warning: the frame size of 1028 bytes is larger than 1024 bytes

This warning does not occur in the latest linux-2.6 or linux-next, however,
when I do a 'make -j32 CONFIG_FRAME_WARN=512' instead of 1024 I see

drivers/net/vxge/vxge-main.c: In function ‘vxge_probe’:
drivers/net/vxge/vxge-main.c:4423: warning: the frame size of 1024 bytes is larger than 512 bytes

This patch moves the large vxge_config struct off the stack.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: CONFIG_NET_NS reduction
Eric Dumazet [Tue, 1 Jun 2010 06:51:19 +0000 (06:51 +0000)]
net: CONFIG_NET_NS reduction

Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoppp: eliminate shadowed variable name
stephen hemminger [Tue, 1 Jun 2010 06:05:46 +0000 (06:05 +0000)]
ppp: eliminate shadowed variable name

Sparse complains about shadowed declaration of skb. So use other
name.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomac8390: propagate error code from request_irq
Finn Thain [Tue, 1 Jun 2010 02:18:32 +0000 (02:18 +0000)]
mac8390: propagate error code from request_irq

Use the request_irq() error code as the return value for mac8390_open().
EAGAIN doesn't make sense for Nubus slot IRQs. Only this driver can claim
this IRQ (until the NIC is removed, which means everything is powered
down).

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocaif: add newlines after declarations in caif_serial.c
Dan Carpenter [Mon, 31 May 2010 21:09:33 +0000 (21:09 +0000)]
caif: add newlines after declarations in caif_serial.c

I added newlines after the declarations in caif_serial.c.  This is
normal kernel style, although I can't see anywhere it's documented.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocaif: remove unneeded variable from caif_net_open()
Dan Carpenter [Mon, 31 May 2010 21:08:55 +0000 (21:08 +0000)]
caif: remove unneeded variable from caif_net_open()

We don't use the "ser" variable so I've removed it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: add additional lock to qdisc to increase throughput
Eric Dumazet [Wed, 2 Jun 2010 12:09:29 +0000 (05:09 -0700)]
net: add additional lock to qdisc to increase throughput

When many cpus compete for sending frames on a given qdisc, the qdisc
spinlock suffers from very high contention.

The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
the lock, and cannot dequeue packets fast enough, since it must wait for
this lock for each dequeued packet.

One solution to this problem is to force all cpus spinning on a second
lock before trying to get the main lock, when/if they see
__QDISC_STATE_RUNNING already set.

The owning cpu then compete with at most one other cpu for the main
lock, allowing for higher dequeueing rate.

Based on a previous patch from Alexander Duyck. I added the heuristic to
avoid the atomic in fast path, and put the new lock far away from the
cache line used by the dequeue worker. Also try to release the busylock
lock as late as possible.

Tests with following script gave a boost from ~50.000 pps to ~600.000
pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
(A single netperf flow can reach ~800.000 pps on this platform)

for j in `seq 0 3`; do
  for i in `seq 0 7`; do
    netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
  done
done

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: optimize tlb_get_least_loaded_slave
Jiri Pirko [Wed, 19 May 2010 03:26:39 +0000 (03:26 +0000)]
bonding: optimize tlb_get_least_loaded_slave

In the worst case, when the first loop breaks an the end of the slave list,
the slave list is iterated through twice. This patch reduces this
function only to one loop. Also makes it simpler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: remove unused original_flags struct slave member
Jiri Pirko [Wed, 19 May 2010 01:17:41 +0000 (01:17 +0000)]
bonding: remove unused original_flags struct slave member

This is stored but never restored. So remove this as it is useless.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: move dev_addr cpy to bond_enslave
Jiri Pirko [Wed, 19 May 2010 01:14:29 +0000 (01:14 +0000)]
bonding: move dev_addr cpy to bond_enslave

Move the code that copies slave's mac address in case that's the first slave into
bond_enslave. Ifenslave app does this also but that's not a problem. This is
something that should be done in bond_enslave, and it shound not matter from
where is it called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/mpc52xx_phy: Various code cleanups
Wolfram Sang [Wed, 2 Jun 2010 10:45:22 +0000 (03:45 -0700)]
net/mpc52xx_phy: Various code cleanups

- don't free bus->irq (obsoleted by ca816d98170942371535b3e862813b0aba9b7d90)
- don't dispose irqs (should be done in of_mdiobus_register())
- use fec-pointer consistently in transfer()
- use resource_size()
- cosmetic fixes

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: make bonding_store_slaves simpler
Jiri Pirko [Tue, 18 May 2010 05:46:39 +0000 (05:46 +0000)]
bonding: make bonding_store_slaves simpler

This patch makes bonding_store_slaves function nicer and easier to understand.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: remove redundant checks from bonding_store_slaves V2
Jiri Pirko [Tue, 18 May 2010 05:44:53 +0000 (05:44 +0000)]
bonding: remove redundant checks from bonding_store_slaves V2

(it's actually the same as v1)

Remove checks that duplicates similar checks in bond_enslave.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: move slave MTU handling from sysfs V2
Jiri Pirko [Tue, 18 May 2010 05:42:40 +0000 (05:42 +0000)]
bonding: move slave MTU handling from sysfs V2

V1->V2: corrected res/ret use

For some reason, MTU handling (storing, and restoring) is taking  place in
bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
So move it there.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: remove unused variable "found"
Jiri Pirko [Mon, 17 May 2010 03:49:54 +0000 (03:49 +0000)]
bonding: remove unused variable "found"

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: fix conflict between null_or_orig and null_or_bond
John Fastabend [Wed, 12 May 2010 21:31:11 +0000 (21:31 +0000)]
net: fix conflict between null_or_orig and null_or_bond

If a skb is received on an inactive bond that does not meet
the special cases checked for by skb_bond_should_drop it should
only be delivered to exact matches as the comment in
netif_receive_skb() says.

However because null_or_bond could also be null this is not
always true.  This patch renames null_or_bond to orig_or_bond
and initializes it to orig_dev.  This keeps the intent of
null_or_bond to pass frames received on VLAN interfaces stacked
on bonding interfaces without invalidating the statement for
null_or_orig.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: init_vlan should not copy slave or master flags
John Fastabend [Wed, 12 May 2010 21:31:06 +0000 (21:31 +0000)]
net: init_vlan should not copy slave or master flags

The vlan device should not copy the slave or master flags from
the real device. It is not in the bond until added nor is it
a master.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>