GitHub/exynos8895/android_kernel_samsung_universal8895.git
12 years agotipc: add __read_mostly annotations to several global variables
Ying Xue [Thu, 16 Aug 2012 12:09:12 +0000 (12:09 +0000)]
tipc: add __read_mostly annotations to several global variables

Added to the following:

 - tipc_random
 - tipc_own_addr
 - tipc_max_ports
 - tipc_net_id
 - tipc_remote_management
 - handler_enabled

The above global variables are read often, but written rarely. Use
__read_mostly to prevent them being on the same cacheline as another
variable which is written to often, which would cause cacheline
bouncing.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: convert tipc_nametbl_size type from variable to macro
Ying Xue [Thu, 16 Aug 2012 12:09:11 +0000 (12:09 +0000)]
tipc: convert tipc_nametbl_size type from variable to macro

There is nothing changing this variable dynamically, so change
it to a macro to make that more obvious when reading the code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: change tipc_net_start routine return value type
Ying Xue [Thu, 16 Aug 2012 12:09:10 +0000 (12:09 +0000)]
tipc: change tipc_net_start routine return value type

Since now tipc_net_start() always returns a success code - 0, its
return value type should be changed from integer to void, which can
avoid unnecessary check for its return value.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: manually inline single use media_name_valid routine
Ying Xue [Thu, 16 Aug 2012 12:09:09 +0000 (12:09 +0000)]
tipc: manually inline single use media_name_valid routine

After eliminating the mechanism which checks whether all letters
in media name string are within a given character set, the
media_name_valid routine becomes trivial.  It is also only
used once, so it is unnecessary to keep it as a separate function.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: remove pointless name sanity check and tipc_alphabet array
Ying Xue [Thu, 16 Aug 2012 12:09:08 +0000 (12:09 +0000)]
tipc: remove pointless name sanity check and tipc_alphabet array

There is no real reason to check whether all letters in the given
media name and network interface name are within the character set
defined in tipc_alphabet array. Even if we eliminate the checking,
the rest of checking conditions in tipc_enable_bearer() can ensure
we do not enable an invalid or illegal bearer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: fix lockdep warning during bearer initialization
Ying Xue [Thu, 16 Aug 2012 12:09:07 +0000 (12:09 +0000)]
tipc: fix lockdep warning during bearer initialization

When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(ptype_lock);
                                local_irq_disable();
                                lock(tipc_net_lock);
                                lock(ptype_lock);
   <Interrupt>
   lock(tipc_net_lock);

  *** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
  -> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
                      [<c1089418>] __lock_acquire+0x528/0x13e0
                      [<c108a360>] lock_acquire+0x90/0x100
                      [<c1553c38>] _raw_spin_lock+0x38/0x50
                      [<c14651ca>] dev_add_pack+0x3a/0x60
                      [<c182da75>] arp_init+0x1a/0x48
                      [<c182dce5>] inet_init+0x181/0x27e
                      [<c1001114>] do_one_initcall+0x34/0x170
                      [<c17f7329>] kernel_init+0x110/0x1b2
                      [<c155b6a2>] kernel_thread_helper+0x6/0x10
[...]
   ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
   ... acquired at:
    [<c108a360>] lock_acquire+0x90/0x100
    [<c1553c38>] _raw_spin_lock+0x38/0x50
    [<c14651ca>] dev_add_pack+0x3a/0x60
    [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
    [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
    [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
    [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
    [<c148e802>] genl_rcv_msg+0x232/0x280
    [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
    [<c148e5bc>] genl_rcv+0x1c/0x30
    [<c148d144>] netlink_unicast+0x174/0x1f0
    [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
    [<c1456bc1>] sock_aio_write+0x161/0x170
    [<c1135a7c>] do_sync_write+0xac/0xf0
    [<c11360f6>] vfs_write+0x156/0x170
    [<c11361e2>] sys_write+0x42/0x70
    [<c155b0df>] sysenter_do_call+0x12/0x38
[...]
}
  -> (tipc_net_lock){+..-..} ops: 4 {
[...]
    IN-SOFTIRQ-R at:
                     [<c108953a>] __lock_acquire+0x64a/0x13e0
                     [<c108a360>] lock_acquire+0x90/0x100
                     [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                     [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                     [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                     [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                     [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                     [<c13c43d2>] pcnet32_poll+0x292/0x780
                     [<c146b00a>] net_rx_action+0xfa/0x1e0
                     [<c103a4be>] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

  kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

  tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack.  But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

  netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock.  At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

  lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

  lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled.  Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotipc: optimize the initialization of network device notifier
Ying Xue [Thu, 16 Aug 2012 12:09:06 +0000 (12:09 +0000)]
tipc: optimize the initialization of network device notifier

Ethernet media initialization is only done when TIPC is started or
switched to network mode. So the initialization of the network device
notifier structure can be moved out of this function and done
statically instead.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Report fanout status via diag engine
Pavel Emelyanov [Thu, 16 Aug 2012 05:36:48 +0000 (05:36 +0000)]
packet: Report fanout status via diag engine

Reported value is the same reported by the FANOUT getsockoption, but
unlike it, the absent fanout setup results in absent nlattr, rather
than in nlattr with zero value. This is done so, since zero fanout
report may mean both -- no fanout, and fanout with both id and type zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Report rings cfg via diag engine
Pavel Emelyanov [Thu, 16 Aug 2012 05:34:22 +0000 (05:34 +0000)]
packet: Report rings cfg via diag engine

One extension bit may result in two nlattrs -- one per ring type.
If some ring type is not configured, then the respective nlatts
will be empty.

The structure reported contains the data, that is given to the
corresponding ring setup socket option.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agogre: information leak in ip6_tnl_ioctl()
Dan Carpenter [Thu, 16 Aug 2012 03:14:04 +0000 (03:14 +0000)]
gre: information leak in ip6_tnl_ioctl()

There is a one byte hole between p->hop_limit and p->flowinfo where
stack memory is leaked to the user.  This was introduced in c12b395a46
"gre: Support GRE over IPv6".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
12 years agoxfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh
Fan Du [Thu, 16 Aug 2012 09:51:25 +0000 (17:51 +0800)]
xfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: fix bogus if statement in sctp_auth_recv_cid()
Dan Carpenter [Thu, 16 Aug 2012 03:16:19 +0000 (03:16 +0000)]
sctp: fix bogus if statement in sctp_auth_recv_cid()

There is an extra semi-colon here, so we always return 0 instead of
calling __sctp_auth_cid().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: fix compile issue with disabled CONFIG_NET_NS
Ulrich Weber [Thu, 16 Aug 2012 01:24:49 +0000 (01:24 +0000)]
sctp: fix compile issue with disabled CONFIG_NET_NS

struct seq_net_private has no struct net
if CONFIG_NET_NS is not enabled

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Wed, 15 Aug 2012 22:26:05 +0000 (15:26 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The ath9k, mwifiex,
and b43 drivers get the bulk of the commits this time, with a handful
of other driver bits thrown-in.  It is mostly just minor fixes and
cleanups, etc.

Also included is a Bluetooth pull, with a lot of refactoring.
Gustavo says:

"These are the changes I queued for 3.7. There are a many
small fixes/improvements by Andre Guedes. A l2cap channel
refcounting refactor by Jaganath. Bluetooth sockets now
appears in /proc/net, by Masatake Yamato and Sachin Kamat
changes ours drivers to use devm_kzalloc()."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: remove wrong initialization for snd_wl1
Razvan Ghitulete [Tue, 14 Aug 2012 13:30:20 +0000 (16:30 +0300)]
net: remove wrong initialization for snd_wl1

The field tp->snd_wl1 is twice initialized, the second time
seems to be wrong as it may overwrite any update in tcp_ack.

Signed-off-by: Razvan Ghitulete <rghitulete@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoxfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire
Fan Du [Wed, 15 Aug 2012 02:13:47 +0000 (10:13 +0800)]
xfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire

Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Wed, 15 Aug 2012 18:29:37 +0000 (14:29 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem

12 years agosctp: fix a compile error in sctp.h
Cong Wang [Wed, 15 Aug 2012 10:18:11 +0000 (18:18 +0800)]
sctp: fix a compile error in sctp.h

I got the following compile error:

In file included from include/net/sctp/checksum.h:46:0,
                 from net/ipv4/netfilter/nf_nat_proto_sctp.c:14:
include/net/sctp/sctp.h: In function ‘sctp_dbg_objcnt_init’:
include/net/sctp/sctp.h:370:88: error: parameter name omitted
include/net/sctp/sctp.h: In function ‘sctp_dbg_objcnt_exit’:
include/net/sctp/sctp.h:371:88: error: parameter name omitted

which is caused by

commit 13d782f6b4fbbaf9d0380a9947deb45a9de46ae7
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Aug 6 08:45:15 2012 +0000

    sctp: Make the proc files per network namespace.

This patch could fix it.

Cc: David S. Miller <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make sysctl tunables per net
Eric W. Biederman [Tue, 7 Aug 2012 07:29:57 +0000 (07:29 +0000)]
sctp: Make sysctl tunables per net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Push struct net down into sctp_verify_ext_param
Eric W. Biederman [Tue, 7 Aug 2012 07:29:08 +0000 (07:29 +0000)]
sctp: Push struct net down into sctp_verify_ext_param

Add struct net as a parameter to sctp_verify_param so it can be passed
to sctp_verify_ext_param where struct net will be needed when the sctp
tunables become per net tunables.

Add struct net as a parameter to sctp_verify_init so struct net can be
passed to sctp_verify_param.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Push struct net down into all of the state machine functions
Eric W. Biederman [Tue, 7 Aug 2012 07:28:09 +0000 (07:28 +0000)]
sctp: Push struct net down into all of the state machine functions

There are a handle of state machine functions primarily those dealing
with processing INIT packets where there is neither a valid endpoint nor
a valid assoication from which to derive a struct net.  Therefore add
struct net * to the parameter list of sctp_state_fn_t and update all of
the state machine functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Push struct net down into sctp_in_scope
Eric W. Biederman [Tue, 7 Aug 2012 07:27:02 +0000 (07:27 +0000)]
sctp: Push struct net down into sctp_in_scope

struct net will be needed shortly when the tunables are made per network
namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Push struct net down into sctp_transport_init
Eric W. Biederman [Tue, 7 Aug 2012 07:26:14 +0000 (07:26 +0000)]
sctp: Push struct net down into sctp_transport_init

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Push struct net down to sctp_chunk_event_lookup
Eric W. Biederman [Tue, 7 Aug 2012 07:25:24 +0000 (07:25 +0000)]
sctp: Push struct net down to sctp_chunk_event_lookup

This trickles up through sctp_sm_lookup_event up to sctp_do_sm
and up further into sctp_primitiv_NAME before the code reaches
places where struct net can be reliably found.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Add infrastructure for per net sysctls
Eric W. Biederman [Tue, 7 Aug 2012 07:23:59 +0000 (07:23 +0000)]
sctp: Add infrastructure for per net sysctls

Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the mib per network namespace
Eric W. Biederman [Mon, 6 Aug 2012 08:47:55 +0000 (08:47 +0000)]
sctp: Make the mib per network namespace

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Enable sctp in all network namespaces
Eric W. Biederman [Mon, 6 Aug 2012 08:46:26 +0000 (08:46 +0000)]
sctp: Enable sctp in all network namespaces

- Fix the sctp_af operations to work in all namespaces
- Enable sctp socket creation in all network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the proc files per network namespace.
Eric W. Biederman [Mon, 6 Aug 2012 08:45:15 +0000 (08:45 +0000)]
sctp: Make the proc files per network namespace.

- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Move the percpu sockets counter out of sctp_proc_init
Eric W. Biederman [Mon, 6 Aug 2012 08:44:24 +0000 (08:44 +0000)]
sctp: Move the percpu sockets counter out of sctp_proc_init

The percpu sctp socket counter has nothing at all to do with the sctp
proc files, and having it in the wrong initialization is confusing,
and makes network namespace support a pain.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the ctl_sock per network namespace
Eric W. Biederman [Mon, 6 Aug 2012 08:43:06 +0000 (08:43 +0000)]
sctp: Make the ctl_sock per network namespace

- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the address lists per network namespace
Eric W. Biederman [Mon, 6 Aug 2012 08:42:04 +0000 (08:42 +0000)]
sctp: Make the address lists per network namespace

- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the association hashtable handle multiple network namespaces
Eric W. Biederman [Mon, 6 Aug 2012 08:41:13 +0000 (08:41 +0000)]
sctp: Make the association hashtable handle multiple network namespaces

- Use struct net in the hash calculation
- Use sock_net(association.base.sk) in the association lookups.
- On receive calculate the network namespace from skb->dev.
- Pass struct net from receive down to the functions that actually
  do the association lookup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the endpoint hashtable handle multiple network namespaces
Eric W. Biederman [Mon, 6 Aug 2012 08:40:21 +0000 (08:40 +0000)]
sctp: Make the endpoint hashtable handle multiple network namespaces

- Use struct net in the hash calculation
- Use sock_net(endpoint.base.sk) in the endpoint lookups.
- On receive calculate the network namespace from skb->dev.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosctp: Make the port hash table use struct net in it's key.
Eric W. Biederman [Mon, 6 Aug 2012 08:39:38 +0000 (08:39 +0000)]
sctp: Make the port hash table use struct net in it's key.

- Add struct net into the port hash table hash calculation
- Add struct net inot the struct sctp_bind_bucket so there
  is a memory of which network namespace a port is allocated in.
  No need for a ref count because sctp_bind_bucket only exists
  when there are sockets in the hash table and sockets can not
  change their network namspace, and sockets already ref count
  their network namespace.
- Add struct net into the key comparison when we are testing
  to see if we have found the port hash table entry we are
  looking for.

With these changes lookups in the port hash table becomes
safe to use in multiple network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Report socket mclist info via diag module
Pavel Emelyanov [Mon, 13 Aug 2012 05:57:44 +0000 (05:57 +0000)]
packet: Report socket mclist info via diag module

The info is reported as an array of packet_diag_mclist structures. Each
includes not only the directly configured values (index, type, etc), but
also the "count".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Report more packet sk info via diag module
Pavel Emelyanov [Mon, 13 Aug 2012 05:55:46 +0000 (05:55 +0000)]
packet: Report more packet sk info via diag module

This reports in one rtattr message all the other scalar values, that can be
set on a packet socket with setsockopt.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Diag core and basic socket info dumping
Pavel Emelyanov [Mon, 13 Aug 2012 05:53:28 +0000 (05:53 +0000)]
packet: Diag core and basic socket info dumping

The diag module can be built independently from the af_packet.ko one,
just like it's done in unix sockets.

The core dumping message carries the info available at socket creation
time, i.e. family, type and protocol (in the same byte order as shown in
the proc file).

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopacket: Introduce net/packet/internal.h header
Pavel Emelyanov [Mon, 13 Aug 2012 05:49:39 +0000 (05:49 +0000)]
packet: Introduce net/packet/internal.h header

The diag module will need to access some private packet_sock data, so
move it to a header in advance. This file will be shared between the
af_packet.c and the diag.c

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf
Igor Maravic [Mon, 13 Aug 2012 08:26:08 +0000 (10:26 +0200)]
net: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf

We've already found leaf, don't search for it again. Same is for fib leaf info.

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoReplace rwlock on xfrm_policy_afinfo with rcu
Priyanka Jain [Sun, 12 Aug 2012 21:22:29 +0000 (21:22 +0000)]
Replace rwlock on xfrm_policy_afinfo with rcu

xfrm_policy_afinfo is read mosly data structure.
Write on xfrm_policy_afinfo is done only at the
time of configuration.
So rwlocks can be safely replaced with RCU.

RCUs usage optimizes the performance.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agogre: Support GRE over IPv6
xeb@mail.ru [Fri, 10 Aug 2012 00:51:50 +0000 (00:51 +0000)]
gre: Support GRE over IPv6

GRE over IPv6 implementation.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: remove netdev_bonding_change()
Amerigo Wang [Thu, 9 Aug 2012 22:14:57 +0000 (22:14 +0000)]
net: remove netdev_bonding_change()

I don't see any benifits to use netdev_bonding_change() than
using call_netdevice_notifiers() directly.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: move and rename netif_notify_peers()
Amerigo Wang [Thu, 9 Aug 2012 22:14:56 +0000 (22:14 +0000)]
net: move and rename netif_notify_peers()

I believe net/core/dev.c is a better place for netif_notify_peers(),
because other net event notify functions also stay in this file.

And rename it to netdev_notify_peers().

Cc: David S. Miller <davem@davemloft.net>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agop54: fix powerpc gcc warnings
Christian Lamparter [Sat, 11 Aug 2012 11:09:20 +0000 (13:09 +0200)]
p54: fix powerpc gcc warnings

My commit "p54: parse output power table" introduced
the following compiler warnings for powerpc-allmodconfig

eeprom.c: In function 'p54_get_maxpower':
eeprom.c:291 warning: comparison of distinct pointer types lacks a cast
eeporm.c:292 warning: comparison of distinct pointer types lacks a cast
eeprom.c:293 warning: comparison of distinct pointer types lacks a cast
eeprom.c:294 warning: comparison of distinct pointer types lacks a cast

This patch fixes those by using max_t(u16
which forces a type cast.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomac80211: fix unnecessary beacon update after peering status change
Marco Porsch [Wed, 8 Aug 2012 05:58:43 +0000 (07:58 +0200)]
mac80211: fix unnecessary beacon update after peering status change

ieee80211_bss_info_change_notify is called everytime a peer link is established
or closed, because the accepting_plinks flag in the meshconf IE *might* have changed.

With this patch the corresponding functions return the BSS_CHANGED_BEACON flag when a beacon update is necessary.

Also it makes mesh_accept_plinks_update the common place to update the accepting_plinks flag.
mesh_accept_plinks_update is called upon plink change and also periodically from ieee80211_mesh_housekeeping.
Thus, it also picks up changes of local->num_sta.

Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de>
Acked-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agobrcmsmac: document firmware dependencies
Jeff Mahoney [Mon, 6 Aug 2012 19:17:26 +0000 (15:17 -0400)]
brcmsmac: document firmware dependencies

The brcmsmac driver requests firmware but doesn't document the
dependency. This means that software that analyzes the modules to
determine if firmware is needed won't detect it.

Specifically, (at least) openSUSE won't install the kernel-firmware
package if no hardware requires it.

This patch adds the MODULE_FIRMWARE directives.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: tune rc_stats to display only valid rates
Rajkumar Manoharan [Fri, 10 Aug 2012 11:17:30 +0000 (16:47 +0530)]
ath9k: tune rc_stats to display only valid rates

This could make rc_stats more simpler and ease the debugging.

Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Trim rate table
Sujith Manoharan [Fri, 10 Aug 2012 11:17:23 +0000 (16:47 +0530)]
ath9k: Trim rate table

Remove ctrl_rate, cw40index, sgi_index, ht_index and calculate
the rate index for TX status from the valid_rate_index that
is populated at initialization time.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Remove MIMO-PS specific code
Sujith Manoharan [Fri, 10 Aug 2012 11:17:16 +0000 (16:47 +0530)]
ath9k: Remove MIMO-PS specific code

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Cleanup TX status API
Sujith Manoharan [Fri, 10 Aug 2012 11:17:09 +0000 (16:47 +0530)]
ath9k: Cleanup TX status API

Calculate the final rate index inside ath_rc_tx_status().

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Bail out properly before calculating rate index
Sujith Manoharan [Fri, 10 Aug 2012 11:17:03 +0000 (16:47 +0530)]
ath9k: Bail out properly before calculating rate index

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Fix RTS/CTS rate selection
Sujith Manoharan [Fri, 10 Aug 2012 11:16:57 +0000 (16:46 +0530)]
ath9k: Fix RTS/CTS rate selection

The current method of assigning the RTS/CTS rate is completely
broken for HT mode and breaks P2P operation. Fix this by using
the basic_rates provided to the driver by mac80211. For now,
choose the lowest supported basic rate for HT frames.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Do not set IEEE80211_TX_RC_USE_SHORT_PREAMBLE
Sujith Manoharan [Fri, 10 Aug 2012 11:16:50 +0000 (16:46 +0530)]
ath9k: Do not set IEEE80211_TX_RC_USE_SHORT_PREAMBLE

mac80211 does it for us.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Unify valid rate calculation routines
Sujith Manoharan [Fri, 10 Aug 2012 11:16:44 +0000 (16:46 +0530)]
ath9k: Unify valid rate calculation routines

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Remove ath_rc_set_valid_rate_idx
Sujith Manoharan [Fri, 10 Aug 2012 11:16:37 +0000 (16:46 +0530)]
ath9k: Remove ath_rc_set_valid_rate_idx

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Change rateset calculation
Sujith Manoharan [Fri, 10 Aug 2012 11:16:31 +0000 (16:46 +0530)]
ath9k: Change rateset calculation

Commit "ath9k: Change rate control to use legacy rate as last MRR"
resulted in the mixing of HT/legacy rates in a single rateset,
which is undesirable. Revert this behavior.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Cleanup index retrieval routines
Sujith Manoharan [Fri, 10 Aug 2012 11:16:24 +0000 (16:46 +0530)]
ath9k: Cleanup index retrieval routines

Trim API and remove unused variables.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Cleanup ath_rc_setvalid_htrates
Sujith Manoharan [Fri, 10 Aug 2012 11:16:18 +0000 (16:46 +0530)]
ath9k: Cleanup ath_rc_setvalid_htrates

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Cleanup ath_rc_setvalid_rates
Sujith Manoharan [Fri, 10 Aug 2012 11:16:11 +0000 (16:46 +0530)]
ath9k: Cleanup ath_rc_setvalid_rates

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Cleanup RC init API
Sujith Manoharan [Fri, 10 Aug 2012 11:16:04 +0000 (16:46 +0530)]
ath9k: Cleanup RC init API

A reference to the rate table is stored inside the
private structure, so there is no need to pass "rate_table"
around.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath9k: Simplify rate table initialization
Sujith Manoharan [Fri, 10 Aug 2012 11:15:52 +0000 (16:45 +0530)]
ath9k: Simplify rate table initialization

Remove various local variables that duplicate information
already stored in mac80211.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: use GFP_ATOMIC under spin lock
Dan Carpenter [Thu, 9 Aug 2012 06:57:57 +0000 (09:57 +0300)]
mwifiex: use GFP_ATOMIC under spin lock

We're holding the sta_list_spinlock here so we can't sleep.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: notify cfg80211 about MIC failures
Amitkumar Karwar [Thu, 9 Aug 2012 02:02:56 +0000 (19:02 -0700)]
mwifiex: notify cfg80211 about MIC failures

Call cfg80211_michael_mic_failure() handler when there is a MIC error
event from firmware.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: fix 'smatch' warning in preparing key_material cmd
Bing Zhao [Thu, 9 Aug 2012 02:01:52 +0000 (19:01 -0700)]
mwifiex: fix 'smatch' warning in preparing key_material cmd

The key length can be 32 bytes for TKIP and 16 bytes for AES_CMAC.
'smatch' warns on memcpy using key_len variable to copy data to
a 16 bytes buffer. Use fixed length to avoid the warning.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agossb: check for flash presentence
Rafał Miłecki [Wed, 8 Aug 2012 17:37:04 +0000 (19:37 +0200)]
ssb: check for flash presentence

We can not assume parallel flash is always present, there are boards
with *serial* flash and probably some without flash at all.
Define some bits by the way.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agob43legacy: fix logic in GPIO init
Rafał Miłecki [Wed, 8 Aug 2012 17:10:16 +0000 (19:10 +0200)]
b43legacy: fix logic in GPIO init

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agob43: N-PHY: fix 0x2057 radio calib copy/paste mistake
Rafał Miłecki [Wed, 8 Aug 2012 17:10:15 +0000 (19:10 +0200)]
b43: N-PHY: fix 0x2057 radio calib copy/paste mistake

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agobcma: add (mostly) NAND defines
Rafał Miłecki [Wed, 8 Aug 2012 17:10:14 +0000 (19:10 +0200)]
bcma: add (mostly) NAND defines

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: fix powerpc64-linux- compilation warnings
Bing Zhao [Tue, 7 Aug 2012 23:08:08 +0000 (16:08 -0700)]
mwifiex: fix powerpc64-linux- compilation warnings

These warnings can be detected by using powerpc64-linux toolchain
(gcc-4.6.3-nolibc).

  CC [M]  drivers/net/wireless/mwifiex/sta_event.o
drivers/net/wireless/mwifiex/sta_event.c: In function 'mwifiex_process_sta_event':
drivers/net/wireless/mwifiex/sta_event.c:388:4: warning: comparison of distinct pointer types lacks a cast [enabled by default]
  CC [M]  drivers/net/wireless/mwifiex/uap_event.o
drivers/net/wireless/mwifiex/uap_event.c: In function 'mwifiex_process_uap_event':
drivers/net/wireless/mwifiex/uap_event.c:258:11: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Use min_t() instead of min() to fix the warnings.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoiwlegacy: clean up suspend/resume
Johannes Berg [Tue, 7 Aug 2012 19:46:44 +0000 (21:46 +0200)]
iwlegacy: clean up suspend/resume

There's no need to export the il_pci_suspend
and il_pci_resume functions since they're only
referenced from il_pm_ops. The latter can also
be defined using SIMPLE_DEV_PM_OPS instead of
open-coding it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: fix code mis-alignment after the if statement
Fengguang Wu [Tue, 7 Aug 2012 02:26:53 +0000 (10:26 +0800)]
mwifiex: fix code mis-alignment after the if statement

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath5k: Put power_level where it belongs and rename it
Nick Kossifidis [Sun, 5 Aug 2012 19:35:36 +0000 (22:35 +0300)]
ath5k: Put power_level where it belongs and rename it

Put power_level to ah_txpower struct with the rest tx power infos and
also rename it to txp_requested to make more sense.

v2 make sure we don't memset it to zero on reset

Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath5k: Preserve tx power level requested from above on phy_init
Nick Kossifidis [Sun, 5 Aug 2012 19:35:35 +0000 (22:35 +0300)]
ath5k: Preserve tx power level requested from above on phy_init

By using cur_pwr on phy_init we re-use the power level previously set by the
driver, not the one we got from above.

Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath5k: Fix range scaling when setting rate power table
Nick Kossifidis [Sun, 5 Aug 2012 19:35:34 +0000 (22:35 +0300)]
ath5k: Fix range scaling when setting rate power table

rates[i] is unsigned but txp_offset can be negative for newer parts
with PDADC table. We cover the case when rates[i] + txp_offset > 63
but we must also cover the case when its < 0 or else rates[i] will overflow.

Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoath5k: Use correct value for min_pwr and cur_pwr
Nick Kossifidis [Sun, 5 Aug 2012 19:35:33 +0000 (22:35 +0300)]
ath5k: Use correct value for min_pwr and cur_pwr

Make sure we don't store the table offsets for min and cur power levels,
store the 0.25dB values instead. This way we don't clamp the tx power level
to max (because now cur_pwr holds the 0.25dB value, not the table offset) after
re-using cur_pwr on reset.

Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agobcma: make some functions static
Hauke Mehrtens [Sun, 5 Aug 2012 14:54:41 +0000 (16:54 +0200)]
bcma: make some functions static

The functions and structs are not used in an other file and the
prototypes are in no header file, just make them static so the compiler
is able to optimize them better.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agop54: parse output power table
Christian Lamparter [Sat, 28 Jul 2012 00:57:51 +0000 (02:57 +0200)]
p54: parse output power table

For the upcoming tpc changes, the driver needs
to provide sensible max output values for each
supported channel.

And while the eeprom always had a output_limit
table, which defines the upper limit for each
frequency and modulation, it was never really
useful for anything... until now.

Note: For anyone wondering about what your card
is calibrated for: check "iw list".
* 2412 MHz [1] (18.0 dBm)
* 2437 MHz [6] (19.0 dBm)
[...]
* 5180 MHz [36] (18.0 dBm)
* 5260 MHz [52] (17.0 dBm) (radar detection)
* 5680 MHz [136] (19.0 dBm) (radar detection)
(for a Dell Wireless 1450 USB Adapter)

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth...
John W. Linville [Fri, 10 Aug 2012 19:13:12 +0000 (15:13 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/bluetooth/bluetooth-next

12 years agohyperv: Add comments for the extended buffer after RNDIS message
Haiyang Zhang [Thu, 9 Aug 2012 08:04:18 +0000 (08:04 +0000)]
hyperv: Add comments for the extended buffer after RNDIS message

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Loopback ifindex is constant now
Pavel Emelyanov [Wed, 8 Aug 2012 21:53:36 +0000 (21:53 +0000)]
net: Loopback ifindex is constant now

As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Make ifindex generation per-net namespace
Pavel Emelyanov [Wed, 8 Aug 2012 21:53:19 +0000 (21:53 +0000)]
net: Make ifindex generation per-net namespace

Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.

This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.

There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.

v2: Place ifindex right after dev_base_seq : avoid two holes and use the
    same cache line, dirtied in list_netdevice()/unlist_netdevice()

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoveth: Allow to create peer link with given ifindex
Pavel Emelyanov [Wed, 8 Aug 2012 21:53:03 +0000 (21:53 +0000)]
veth: Allow to create peer link with given ifindex

The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago),
so take the given ifidex and register netdev with it.

Ben noticed, that this code path previously ignored ifmp->ifi_index and
userland could be passing in garbage. Thus it may now fail occasionally
because the value clashes with an existing interface.

To address this it's assumed that if the caller specifies the ifindex for
the veth master device, then it's aware of this possibility and should
explicitly specify (or set to 0 for auto-assignment) the peer's ifindex as
well. With this the compatibility with old tools not setting ifindex is
preserved.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Allow to create links with given ifindex
Pavel Emelyanov [Wed, 8 Aug 2012 21:52:46 +0000 (21:52 +0000)]
net: Allow to create links with given ifindex

Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
is not zero. I propose to allow requesting ifindices on link creation. This
is required by the checkpoint-restore to correctly restore a net namespace
(i.e. -- a container).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Dont use ifindices in hash fns
Pavel Emelyanov [Wed, 8 Aug 2012 21:52:28 +0000 (21:52 +0000)]
net: Dont use ifindices in hash fns

Eric noticed, that when there will be devices with equal indices, some
hash functions that use them will become less effective as they could.
Fix this in advance by mixing the net_device address into the hash value
instead of the device index.

This is true for arp and ndisc hash fns. The netlabel, can and llc ones
are also ifindex-based, but that three are init_net-only, thus will not
be affected.

Many thanks to David and Eric for the hash32_ptr implementation!

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotime: jiffies_delta_to_clock_t() helper to the rescue
Eric Dumazet [Wed, 8 Aug 2012 21:13:53 +0000 (21:13 +0000)]
time: jiffies_delta_to_clock_t() helper to the rescue

Various /proc/net files sometimes report crazy timer values, expressed
in clock_t units.

This happens when an expired timer delta (expires - jiffies) is passed
to jiffies_to_clock_t().

This function has an overflow in :

return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);

commit cbbc719fccdb8cb (time: Change jiffies_to_clock_t() argument type
to unsigned long) only got around the problem.

As we cant output negative values in /proc/net/tcp without breaking
various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
that caps the negative delta to a 0 value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: hank <pyu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agofib: use __fls() on non null argument
Eric Dumazet [Tue, 7 Aug 2012 10:45:47 +0000 (10:45 +0000)]
fib: use __fls() on non null argument

__fls(x) is a bit faster than fls(x), granted we know x is non null.

As Ben Hutchings pointed out, fls(x) = __fls(x) + 1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: output path optimizations
Eric Dumazet [Tue, 7 Aug 2012 02:19:56 +0000 (02:19 +0000)]
net: output path optimizations

1) Avoid dirtying neighbour's confirmed field.

  TCP workloads hits this cache line for each incoming ACK.
  Lets write n->confirmed only if there is a jiffie change.

2) Optimize neigh_hh_output() for the common Ethernet case, were
   hh_len is less than 16 bytes. Replace the memcpy() call
   by two inlined 64bit load/stores on x86_64.

Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)

24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

before : 2.247s, 2.235s, 2.247s, 2.318s
after  : 1.884s, 1.905s, 1.891s, 1.895s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodocumentation: dt: bindings: cpsw: fixing the examples for directly using it in dts...
Mugunthan V N [Mon, 6 Aug 2012 05:05:58 +0000 (05:05 +0000)]
documentation: dt: bindings: cpsw: fixing the examples for directly using it in dts file

Fixing the cpsw device tree example to make it simpler to copy pastable to dts
file and use it directly.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers: net: ethernet: davince_mdio: device tree implementation
Mugunthan V N [Mon, 6 Aug 2012 05:05:57 +0000 (05:05 +0000)]
drivers: net: ethernet: davince_mdio: device tree implementation

device tree implementation for davinci mdio driver

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: ecn: dont delay ACKS after CE
Eric Dumazet [Mon, 6 Aug 2012 11:04:43 +0000 (11:04 +0000)]
tcp: ecn: dont delay ACKS after CE

While playing with CoDel and ECN marking, I discovered a
non optimal behavior of receiver of CE (Congestion Encountered)
segments.

In pathological cases, sender has reduced its cwnd to low values,
and receiver delays its ACK (by 40 ms).

While RFC 3168 6.1.3 (The TCP Receiver) doesn't explicitly recommend
to send immediate ACKS, we believe its better to not delay ACKS, because
a CE segment should give same signal than a dropped segment, and its
quite important to reduce RTT to give ECE/CWR signals as fast as
possible.

Note we already call tcp_enter_quickack_mode() from TCP_ECN_check_ce()
if we receive a retransmit, for the same reason.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: tcp: GRO should be ECN friendly
Eric Dumazet [Sun, 5 Aug 2012 22:34:50 +0000 (22:34 +0000)]
net: tcp: GRO should be ECN friendly

While doing TCP ECN tests, I discovered GRO was reordering packets if it
receives one packet with CE set, while previous packets in same NAPI run
have ECT(0) for the same flow :

09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags
[DF], proto TCP (6), length 4396)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 4344

09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF],
proto TCP (6), length 1500)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 1448

09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF],
proto TCP (6), length 64)
    172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f
(incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val
1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0

We have two problems here :

1) GRO reorders packets

  If NIC gave packet1, then packet2, which happen to be from "different
flows"  GRO feeds stack with packet2, then packet1. I have yet to
understand how to solve this problem.

2) GRO is not ECN friendly

Delivering packets out of order makes TCP stack not as fast as it could
be.

In this patch I suggest we make the tos test not part of the 'same_flow'
determination, but part of the 'should flush' logic

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: reorganize IP MIB values
Eric Dumazet [Sat, 4 Aug 2012 20:33:59 +0000 (20:33 +0000)]
net: reorganize IP MIB values

Reduce IP latencies by placing hot MIB IP fields in a single cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: avoid reloads in SNMP_UPD_PO_STATS
Eric Dumazet [Sat, 4 Aug 2012 20:26:13 +0000 (20:26 +0000)]
net: avoid reloads in SNMP_UPD_PO_STATS

Avoid two instructions to reload dev->nd_net->mib.ip_statistics pointer,
unsing a temp variable, in ip_rcv(), ip_output() paths for example.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomwifiex: add AES_CMAC support in key_material cmd
Ying Luo [Sat, 4 Aug 2012 01:06:14 +0000 (18:06 -0700)]
mwifiex: add AES_CMAC support in key_material cmd

The sequence counter will be sent to firmware via key_material
command.

Signed-off-by: Ying Luo <luoy@marvell.com>
Signed-off-by: Stone Piao <piaoyun@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: pass key_params pointer in mwifiex_set_encode
Ying Luo [Sat, 4 Aug 2012 01:06:13 +0000 (18:06 -0700)]
mwifiex: pass key_params pointer in mwifiex_set_encode

'cipher' and 'seq' coming from cfg80211 add_key handler will be
parsed in mwifiex_set_encode() to handle AES_CMAC cipher suite.

Signed-off-by: Ying Luo <luoy@marvell.com>
Signed-off-by: Stone Piao <piaoyun@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: rename wapi_rxpn to pn
Ying Luo [Sat, 4 Aug 2012 01:06:12 +0000 (18:06 -0700)]
mwifiex: rename wapi_rxpn to pn

This array was used for wapi_rxpn only. Now it will be used for
AES_CMAC as well. So make a generic name for it.

Signed-off-by: Ying Luo <luoy@marvell.com>
Signed-off-by: Stone Piao <piaoyun@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: cleanup TX/RX BA tables for uAP
Avinash Patil [Sat, 4 Aug 2012 01:06:11 +0000 (18:06 -0700)]
mwifiex: cleanup TX/RX BA tables for uAP

Cleanup TX/RX BA tables when AP receives deauthentication from
associated station. During BSS_IDLE event, all wmm queues, BA
streams created for AP interface are deleted.

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: add 11n Block Ack support for uAP
Avinash Patil [Sat, 4 Aug 2012 01:06:10 +0000 (18:06 -0700)]
mwifiex: add 11n Block Ack support for uAP

This patch adds support for handling BA request and BA setup
events for AP interface.

RA list is marked as either 11n enabled or disabled from station's
capabilities in association request. BA setup is initiated only
after some specific number of packets for particular RA list are
transmitted.

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: support RX AMSDU aggregation for uAP
Avinash Patil [Sat, 4 Aug 2012 01:06:09 +0000 (18:06 -0700)]
mwifiex: support RX AMSDU aggregation for uAP

This patch adds support for reception and decoding of AMSDU
aggregation frames for AP interface.
Patch also adds support for handling AMSDU aggregation event.

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
12 years agomwifiex: improve uAP RX handling
Avinash Patil [Sat, 4 Aug 2012 01:06:08 +0000 (18:06 -0700)]
mwifiex: improve uAP RX handling

1. Separate file for uAP RX handling.
2. If received packet is broadcast/multicast, send it to kernel
   as well as requeue it back to uAP TX queue.
3. If received packet is for associated STA (intra-BSS), requeue
   it back to uAP TX queue.
4. In all other cases (packets for AP or inter-BSS packets),
   pass packet to kernel to handle it accordingly.

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>