GitHub/MotorolaMobilityLLC/kernel-slsi.git
11 years agoqlcnic: Support VLAN id config.
Rajesh Borundia [Fri, 19 Apr 2013 07:01:12 +0000 (07:01 +0000)]
qlcnic: Support VLAN id config.

o Add support for VLAN id configuration per VF using
  iproute2 tool.
o VLAN id's 1-4094 are treated as PVID by the PF and
  Guest VLAN tagging is not allowed by default.
o PVID is disabled when the VLAN id is set to 0
o Guest VLAN tagging is allowed when the VLAN id is set to 4095.
o Only one Guest VLAN id  is supported.
o VLAN id can be changed only when the VF driver is not loaded.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Support MAC address, Tx rate config.
Rajesh Borundia [Fri, 19 Apr 2013 07:01:11 +0000 (07:01 +0000)]
qlcnic: Support MAC address, Tx rate config.

o Add support for MAC address and Tx rate configuration
  per VF via iproute2 tool.
o Tx rate change is allowed while the guest is running
  and the VF driver is loaded.
o MAC address change is allowed only when VF driver
  is not loaded.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: VF reset recovery implementation.
Rajesh Borundia [Fri, 19 Apr 2013 07:01:10 +0000 (07:01 +0000)]
qlcnic: VF reset recovery implementation.

o Implement recovery mechanism for VF to recover from
  adapter resets.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: VF FLR implementation.
Rajesh Borundia [Fri, 19 Apr 2013 07:01:09 +0000 (07:01 +0000)]
qlcnic: VF FLR implementation.

o FLR from Hypervisor - When hypervisor issues a VF FLR request,
  adapter notifies the parent PF driver of the FLR request for PF
  driver to perform any cleanup on behalf of that VF.
o FLR from VF Driver - VF driver may initiate a VF FLR request,
  if VF state needs to be cleaned up before a re-initialization.
  VF re-initialization during kdump is an example.
o PF driver cleans up all resources allocated on behalf of a  VF,
  on VF FLR notifications from the adapter or from the VF driver.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Change 82xx adapter VLAN id endian type.
Rajesh Borundia [Fri, 19 Apr 2013 07:01:08 +0000 (07:01 +0000)]
qlcnic: Change 82xx adapter VLAN id endian type.

o 82xx adapter requires VLAN id in little endian format.
  Instead of passing vlan id parameter as __le16, pass the
  parameter as u16 and  use cpu_to_le16 at appropriate places.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'netlink-mmap'
David S. Miller [Fri, 19 Apr 2013 19:37:09 +0000 (15:37 -0400)]
Merge branch 'netlink-mmap'

Patrick McHardy says:

====================
The following patches contain an implementation of memory mapped I/O for
netlink. The implementation is modelled after AF_PACKET memory mapped I/O
with a few differences:

- In order to perform memory mapped I/O to userspace, the kernel allocates
  skbs with the data area pointing to the data area of the mapped frames.
  All netlink subsystems assume a linear data area, so for the sake of
  simplicity, the mapped data area is not attached to the paged area but
  to skb->data. This requires introduction of a special skb alloction
  function that just allocates an skb head without the data area. Since this
  is a quite rare use case, I introduced a new function based on __alloc_skb
  instead of splitting it up into head and data alloction. The alternative
  would be to   introduce an __alloc_skb_head and __alloc_skb_data function,
  which would actually be useful for a specific error case in memory mapped
  netlink, but would require a couple of extra instructions for the common
  skb allocation case, so it doesn't really seem worth it.

  In order to get the destination memory area for skb->data before message
  construction, memory mapped netlink I/O needs to look up the destination
  socket during allocation instead of during transmission because the
  ring is owned by the receiveing socket/process. A special skb allocation
  function (netlink_alloc_skb) taking the destination pid as an argument is
  used for this, all subsystems that want to support memory mapped I/O need
  to use this function, automatic fallback to the receive queue happens
  for unconverted subsystems. Dumps automatically use memory mapped I/O if
  the receiving socket has enabled it.

  The visible effect of looking up the destination socket during allocation
  instead of transmission is that message ordering in userspace might
  change in case allocation and transmission aren't performed atomically.
  This usually doesn't matter since most subsystems have a BKL-like lock
  like the rtnl mutex, to my knowledge the currently only existing case
  where it might matter is nfnetlink_queue combined with the recently
  introduced batched verdicts, but a) that subsystem already includes
  sequence numbers which allow userspace to reorder messages in case it
  cares to, also the reodering window is quite small and b) with memory
  mapped transmission batching can be performed in a subsystem indepandant
  manner.

- AF_NETLINK contains flow control for database dumps, with regular I/O
  dump continuation are triggered based on the sockets receive queue space
  and by recvmsg() calls. Since with memory mapped I/O there are no
  recvmsg() calls under normal operation, this is done in netlink_poll(),
  under the assumption that userspace has processed all pending frames
  before invoking poll(), thus the ring is expected to have room for new
  messages. Dumps currently don't benefit as much as they could from
  memory mapped I/O because each single continuation requires a poll()
  call. A more agressive approach seems like a good idea to me, especially
  in case the socket is not subscribed to any multicast groups (IOW only
  receiving explicitly requested data).

Besides that, the memory mapped netlink implementation extends the states
defined by AF_PACKET between userspace and the kernel by a SKIP status, this
is intended for the case that userspace wants to queue frames (specifically
when using nfnetlink_queue, an IDS and stream reassembly, requested by
Eric Leblond) for a longer period of time. The kernel skips over all frames
marked with SKIP when looking or unused frames and only fails when not finding
a free frame or when having skipped the entire ring.

Also noteworthy is memory mapped sendmsg: the kernel performs validation
of messages before accepting and processing them, in order to prevent
userspace from changing the messages contents after validation, the
kernel checks that the ring is only mapped once and the file descriptor
is not shared (in order to avoid having userspace set up another mapping
after the first mentioned check). If either of both is not true, the
message copied to an allocated skb and processed as with regular I/O.
I'd especially appreciate review of this part since I'm not really versed
in memory, file and process management,

The remaining interesting details are included in the changelogs of the
individual patches and the documentation, so I won't repeat them here.

As an example, nfnetlink_queue is convererted to support memory mapped
I/O. Other subsystems that would probably benefit are nfnetlink_log,
audit and maybe ISCSI, not sure.

Following are some numbers collected by Florian Westphal based on a
slightly older version, which included an experimental patch for the
nfnetlink_queue ordering issue.

===

Test hardware is a 12-core machine
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
ixgbe interfaces are used (i.e., multiqueue nics).
irqs are distributed across the cpus.

I've made several tests.

The simple one consists of 3GBit UDP traffic, packets are 1500 bytes
in size (i.e., no fragmentation), with a single nfqueue
and the test client programs in libmnl examples directory.
Packets are sent from one /24 net to another /24 net, i.e.
there are a few hundred flows active at any given time.

I've also tested with snort, but I disabled all rules.
6Gbit UDP traffic is generated in the snort case, and
6 nfqueues are used (i.e., 6 snorts run in parallel).

I've tested with 3 different kernels, all based on 3.7.1.
- 3.7.1, without the mmap patches
- 3.7.1, with Patricks mmap patches
- 3.7.1, with mmap patches and extended spinlock to ensure packet ids are
  monotonically increasing and cannot be re-ordered.  This is what we
  currently ship in our product.

  [ the spinlock that is extended is the per nfqueue spinlock, it will
    be held from the time the netlink skb is allocated until the netlink
    skb is sent to userspace:

    http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=b8eb19c46650fef4e9e4fe53f367f99bbf72afc9
  ]

snort is normally used in "batch mode", i.e., after processing 25 packets
a single "batch verdict" is sent to accept the packets seen so far.
"mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this
time (except where noted below).

One reason is that snort has a reload thread, so kernel needs to copy;
also in the snort case no payload rewrite takes place, so compared
to the rx path the tx path is cheap.

Results:

3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone
nfq-queue:           1.7 gbit out
snort-recv-batch-25  5.1 gbit out
snort-recv-no-batch  3.1 gbit out

3.7.1 + mmap + without extended spinlocked section
nfq-queue:           1.7 gbit out (recv/sendmsg)
nfq-queue-mmap:      2.4 gbit out
snort-mmap-batch-25  5.6 gbit out  (warning: since ids can be
                                        re-ordered, this version is "broken").
snort-recv-batch-25  5.1 gbit out
snort-mmap-no-batch  4.6 gbit out (i.e., one verdict per packet)

Kernel 3.7.1 + mmap + extended spinlock section:
nfq-queue: 1.4 gbit out
nfq-queue-mmap: 2.3 gbit out
snort:          5.6 gbit out

Conclusions:
- The "extended spinlocked section" hurts performance in the
  single queue case; with 6 snorts there is no measureable slowdown.
- I tried to re-write the mmap-snort to work without batch verdicts, but
  results were not very encouraging:

kernel 3.7.1 + mmap (without extended spinlocked section):

snort-mmap-batch-25      5.6 gbit out (what we currenlty ship)
snort-recv-batch-25      5.1 gbit out (without using mmap)
snort-mmap-batch-1       4.6 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-25     5.2 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-1      4.6 gbit out (with mmap but without batch verdicts)

The difference between the last two is that in the txring-25 case, we
put a verdict into the tx ring after every packet, but will only
invoke sendmsg(, NULL, 0) after processing 25 packets.  So the only
difference is the number of sendmsg calls/context switches.

So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster
than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonfnetlink: add support for memory mapped netlink
Patrick McHardy [Wed, 17 Apr 2013 06:47:09 +0000 (06:47 +0000)]
nfnetlink: add support for memory mapped netlink

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetfilter: rename netlink related "pid" variables to "portid"
Patrick McHardy [Wed, 17 Apr 2013 06:47:08 +0000 (06:47 +0000)]
netfilter: rename netlink related "pid" variables to "portid"

Get rid of the confusing mix of pid and portid and use portid consistently
for all netlink related socket identities.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add documentation for memory mapped I/O
Patrick McHardy [Wed, 17 Apr 2013 06:47:07 +0000 (06:47 +0000)]
netlink: add documentation for memory mapped I/O

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add RX/TX-ring support to netlink diag
Patrick McHardy [Wed, 17 Apr 2013 06:47:06 +0000 (06:47 +0000)]
netlink: add RX/TX-ring support to netlink diag

Based on AF_PACKET.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add flow control for memory mapped I/O
Patrick McHardy [Wed, 17 Apr 2013 06:47:05 +0000 (06:47 +0000)]
netlink: add flow control for memory mapped I/O

Add flow control for memory mapped RX. Since user-space usually doesn't
invoke recvmsg() when using memory mapped I/O, flow control is performed
in netlink_poll(). Dumps are allowed to continue if at least half of the
ring frames are unused.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: implement memory mapped recvmsg()
Patrick McHardy [Wed, 17 Apr 2013 06:47:04 +0000 (06:47 +0000)]
netlink: implement memory mapped recvmsg()

Add support for mmap'ed recvmsg(). To allow the kernel to construct messages
into the mapped area, a dataless skb is allocated and the data pointer is
set to point into the ring frame. This means frames will be delivered to
userspace in order of allocation instead of order of transmission. This
usually doesn't matter since the order is either not determinable by
userspace or message creation/transmission is serialized. The only case
where this can have a visible difference is nfnetlink_queue. Userspace
can't assume mmap'ed messages have ordered IDs anymore and needs to check
this if using batched verdicts.

For non-mapped sockets, nothing changes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: implement memory mapped sendmsg()
Patrick McHardy [Wed, 17 Apr 2013 06:47:03 +0000 (06:47 +0000)]
netlink: implement memory mapped sendmsg()

Add support for mmap'ed sendmsg() to netlink. Since the kernel validates
received messages before processing them, the code makes sure userspace
can't modify the message contents after invoking sendmsg(). To do that
only a single mapping of the TX ring is allowed to exist and the socket
must not be shared. If either of these two conditions does not hold, it
falls back to copying.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add mmap'ed netlink helper functions
Patrick McHardy [Wed, 17 Apr 2013 06:47:02 +0000 (06:47 +0000)]
netlink: add mmap'ed netlink helper functions

Add helper functions for looking up mmap'ed frame headers, reading and
writing their status, allocating skbs with mmap'ed data areas and a poll
function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: mmaped netlink: ring setup
Patrick McHardy [Wed, 17 Apr 2013 06:47:01 +0000 (06:47 +0000)]
netlink: mmaped netlink: ring setup

Add support for mmap'ed RX and TX ring setup and teardown based on the
af_packet.c code. The following patches will use this to add the real
mmap'ed receive and transmit functionality.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add netlink_skb_set_owner_r()
Patrick McHardy [Wed, 17 Apr 2013 06:47:00 +0000 (06:47 +0000)]
netlink: add netlink_skb_set_owner_r()

For mmap'ed I/O a netlink specific skb destructor needs to be invoked
after the final kfree_skb() to clean up state. This doesn't work currently
since the skb's ownership is transfered to the receiving socket using
skb_set_owner_r(), which orphans the skb, thereby invoking the destructor
prematurely.

Since netlink doesn't account skbs to the originating socket, there's no
need to orphan the skb. Add a netlink specific skb_set_owner_r() variant
that does not orphan the skb and use a netlink specific destructor to
call sock_rfree().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: don't orphan skb in netlink_trim()
Patrick McHardy [Wed, 17 Apr 2013 06:46:59 +0000 (06:46 +0000)]
netlink: don't orphan skb in netlink_trim()

Netlink doesn't account skbs to the sending socket, so the there's no
need to orphan the skb before trimming it.

Removing the skb_orphan() call is required for mmap'ed netlink, which uses
a netlink specific skb destructor that must not be invoked before the
final freeing of the skb.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add function to allocate sk_buff head without data area
Patrick McHardy [Wed, 17 Apr 2013 06:46:58 +0000 (06:46 +0000)]
net: add function to allocate sk_buff head without data area

Add a function to allocate a sk_buff head without any data. This will
be used by memory mapped netlink to attach data from the mmaped area
to the skb.

Additionally change skb_release_all() to check whether the skb has a
data area to allow the skb destructor to clear the data pointer in case
only a head has been allocated.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: rename ssk to sk in struct netlink_skb_params
Patrick McHardy [Wed, 17 Apr 2013 06:46:57 +0000 (06:46 +0000)]
netlink: rename ssk to sk in struct netlink_skb_params

Memory mapped netlink needs to store the receiving userspace socket
when sending from the kernel to userspace. Rename 'ssk' to 'sk' to
avoid confusion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: add symbolic value for congested state
Patrick McHardy [Wed, 17 Apr 2013 06:46:56 +0000 (06:46 +0000)]
netlink: add symbolic value for congested state

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch '8021ad'
David S. Miller [Fri, 19 Apr 2013 18:46:27 +0000 (14:46 -0400)]
Merge branch '8021ad'

Patrick McHardy says:

====================
The following patches add support for 802.1ad (provider tagging) to the
VLAN driver. The patchset consists of the following parts:

- renaming of the NET_F_HW_VLAN feature flags to indicate that they only
  operate on CTAGs

- preparation for 802.1ad VLAN filtering offload by adding a proto argument
  to the rx_{add,kill}_vid net_device_ops callbacks

- preparation of the VLAN code to support multiple protocols by making the
  protocol used for tagging a property of the VLAN device and converting
  the device lookup functions accordingly

- second step of preparation of the VLAN code by making the packet tagging
  functions take a protocol argument

- introducation of 802.1ad support in the VLAN code, consisting mainly of
  checking for ETH_P_8021AD in a couple of places and testing the netdevice
  offload feature checks to take the protocol into account

- announcement of STAG offloading capabilities in a couple of drivers for
  virtual network devices

The patchset is based on net-next.git and has been tested with single and
double tagging with and without HW acceleration (for CTAGs).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: announce STAG offload capability in some drivers
Patrick McHardy [Fri, 19 Apr 2013 02:04:32 +0000 (02:04 +0000)]
net: vlan: announce STAG offload capability in some drivers

- macvlan: propagate STAG filtering capabilities from underlying device
- ifb: announce STAG tagging support in addition to CTAG tagging support
- veth: announce STAG tagging/stripping support in addition to CTAG support

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: add 802.1ad support
Patrick McHardy [Fri, 19 Apr 2013 02:04:31 +0000 (02:04 +0000)]
net: vlan: add 802.1ad support

Add support for 802.1ad VLAN devices. This mainly consists of checking for
ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check
offloading capabilities based on the used protocol.

Configuration is done using "ip link":

# ip link add link eth0 eth0.1000 \
type vlan proto 802.1ad id 1000
# ip link add link eth0.1000 eth0.1000.1000 \
type vlan proto 802.1q id 1000

52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
    20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64
92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84)
    20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: add protocol argument to packet tagging functions
Patrick McHardy [Fri, 19 Apr 2013 02:04:30 +0000 (02:04 +0000)]
net: vlan: add protocol argument to packet tagging functions

Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: prepare for 802.1ad support
Patrick McHardy [Fri, 19 Apr 2013 02:04:29 +0000 (02:04 +0000)]
net: vlan: prepare for 802.1ad support

Make the encapsulation protocol value a property of VLAN devices and change
the device lookup functions to take the protocol value into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: prepare for 802.1ad VLAN filtering offload
Patrick McHardy [Fri, 19 Apr 2013 02:04:28 +0000 (02:04 +0000)]
net: vlan: prepare for 802.1ad VLAN filtering offload

Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*
Patrick McHardy [Fri, 19 Apr 2013 02:04:27 +0000 (02:04 +0000)]
net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*

Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'intel'
David S. Miller [Fri, 19 Apr 2013 18:19:07 +0000 (14:19 -0400)]
Merge branch 'intel'

Jeff Kirsher says:

====================
This series contains updates to ixgbe and igb.

The ixgbe changes contains 2 patches from the community, one which is a
fix from akepner to fix a issue where netif_running() in shutdown was
not done under rtnl_lock.  The other community fix from Joe Perches
cleans up #ifdef CONFIG_DEBUG_FS which is no longer necessary.  The
last ixgbe patch, from Jacob Keller, adds support for WoL on 82559
SFP+ LOM.

The remaining patches are against igb, 10 of which were previously
submitted in a pull request where changes were requested.

The following igb patches:
 igb: Support for 100base-fx SFP
 igb: Support to read and export SFF-8472/8079 data
are v2 based on feedback from Dan Carpenter and Ben Hutchings in
the previous pull request.

The largest set of changes are in my patch to cleanup code comments
and whitespace to align the igb driver with the networking style of
code comments.  While cleaning up the code comments, fixed several
other whitespace/checkpatch.pl code formatting issues.

Other notable igb patches are EEE capable devices query the PHY to
determine what the link partner is advertising, added support for
i354 devices and added support for spoofchk config.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoigb: Add support for i354 devices
Carolyn Wyborny [Thu, 18 Apr 2013 22:21:30 +0000 (22:21 +0000)]
igb: Add support for i354 devices

This patch adds base support for new i354 devices.  Loopback test is
unsupported for this release.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: add support for spoofchk config
Lior Levy [Sun, 3 Mar 2013 20:27:48 +0000 (20:27 +0000)]
igb: add support for spoofchk config

Add support for spoofchk configuration per VF via iproute2 tool.

Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Enable EEE LP advertisement
Matthew Vick [Thu, 21 Feb 2013 03:32:52 +0000 (03:32 +0000)]
igb: Enable EEE LP advertisement

On EEE-capable devices, query the PHY to determine what the link partner is
advertising.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Fix code comments and whitespace
Jeff Kirsher [Sat, 23 Feb 2013 07:29:56 +0000 (07:29 +0000)]
igb: Fix code comments and whitespace

Aligns the multi-line code comments with the desired style for the
networking tree.  Also cleaned up whitespace issues found during the
cleanup of code comments (i.e. remove unnecessary blank lines,
use tabs where possible, properly wrap lines and keep strings on a
single line)

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
11 years agoigb: Fix sparse warnings on function pointers
Akeem G. Abodunrin [Sat, 16 Feb 2013 07:09:06 +0000 (07:09 +0000)]
igb: Fix sparse warnings on function pointers

This patch fixes sparse warnings on function pointers that are not
defined as static.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Use rx/tx_itr_setting when setting up initial value of itr
Alexander Duyck [Tue, 12 Feb 2013 02:31:01 +0000 (02:31 +0000)]
igb: Use rx/tx_itr_setting when setting up initial value of itr

It turns out that the InterruptThrottleRate module parameter was only
having the effect of locking the ITR at the starting ITR value. This was
because the values stored in rx_itr_setting and tx_itr_setting were being
ignored when configuring the initial itr_val of the q_vector.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Pull adapter out of main path in igb_xmit_frame_ring
Alexander Duyck [Thu, 7 Feb 2013 08:55:46 +0000 (08:55 +0000)]
igb: Pull adapter out of main path in igb_xmit_frame_ring

We only need the adapter pointer in the case of ptp.  As such we can pull the
adapter out of the main path and place it inside the if statement to avoid
the temptation of accessing the adapter pointer in the fast path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Mask off check of frag_off as we only want fragment offset
Alexander Duyck [Fri, 1 Feb 2013 08:56:47 +0000 (08:56 +0000)]
igb: Mask off check of frag_off as we only want fragment offset

We were incorrectly checking the entire frag_off field when we only wanted the
fragment offset.  As a result we were not pulling in TCP headers when the DNF
flag was set.

To correct that we will now check for frag off using the IP_OFFSET mask.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: random code and comments fix
Akeem G. Abodunrin [Tue, 29 Jan 2013 10:15:31 +0000 (10:15 +0000)]
igb: random code and comments fix

This patch fixes code and comments as identified in the driver.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Implement support to power sfp cage and turn on I2C
Akeem G. Abodunrin [Tue, 29 Jan 2013 10:15:26 +0000 (10:15 +0000)]
igb: Implement support to power sfp cage and turn on I2C

Based on original patch from Aurélien Guillaume <footplus@gmail.com>
This patch adds support to turn on I2C, with sfp cage powered.

CC: Aurélien Guillaume <footplus@gmail.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Support to read and export SFF-8472/8079 data
Akeem G. Abodunrin [Thu, 11 Apr 2013 06:36:35 +0000 (06:36 +0000)]
igb: Support to read and export SFF-8472/8079 data

This patch adds support to read and export SFF-8472/8079 (SFP data)
over i2c, through Ethtool.

v2: Changed implementation to accommodate any offset within SFF module
    length boundary.

Reported-by: Aurélien Guillaume <footplus@gmail.com>
CC: Aurélien Guillaume <footplus@gmail.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Support for 100base-fx SFP
Akeem G. Abodunrin [Fri, 5 Apr 2013 16:49:06 +0000 (16:49 +0000)]
igb: Support for 100base-fx SFP

This patch adds support for 100base-fx SFP and report proper link speed/duplex
via Ethtool.

v2: fix smatch warnings

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Remove unnecessary #ifdef CONFIG_DEBUG_FS tests
Joe Perches [Fri, 12 Apr 2013 17:12:54 +0000 (17:12 +0000)]
ixgbe: Remove unnecessary #ifdef CONFIG_DEBUG_FS tests

Add some empty static inlines instead to make
the code more readable.

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Add support for WoL on 82599 SFP+ LOM
Jacob Keller [Wed, 3 Apr 2013 04:41:37 +0000 (04:41 +0000)]
ixgbe: Add support for WoL on 82599 SFP+ LOM

This patch adds software support for WoL for the 82599 SFP+ LOM device,
(ID 0x8976)

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: in shutdown, do netif_running() under rtnl_lock
akepner [Wed, 13 Mar 2013 14:54:58 +0000 (14:54 +0000)]
ixgbe: in shutdown, do netif_running() under rtnl_lock

During shutdown it's possible for __dev_close() (which holds
rtnl_lock) to clear the __LINK_STATE_START bit, and for ixgbe
to then read that bit (without holding rtnl_lock), and then
not fail to free irqs, etc. The result is a crash like this:

------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:313!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Pid: 5910, comm: reboot Tainted: P           ----------------   2.6.32 #1 empty
RIP: 0010:[<ffffffff81305c2b>]  [<ffffffff81305c2b>] free_msi_irqs+0x11b/0x130
RSP: 0018:ffff880185c9bc88  EFLAGS: 00010282
RAX: ffff880219f58bc0 RBX: ffff88021ac53b00 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 000000000000004a
RBP: ffff880185c9bcc8 R08: 0000000000000002 R09: 0000000000000106
R10: 0000000000000000 R11: 0000000000000006 R12: ffff88021e524778
R13: 0000000000000001 R14: ffff88021e524000 R15: 0000000000000000
FS:  00007f90821b7700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f90818bd010 CR3: 0000000132c64000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process reboot (pid: 5910, threadinfo ffff880185c9a000, task ffff88021bf04a80)
Stack:
 ffff880185c9bc98 000000018130529d ffff880185c9bcc8 ffff88021e524000
<0> 0000000000000004 ffff88021948c700 0000000000000000 ffff880185c9bda7
<0> ffff880185c9bce8 ffffffff81305cbd ffff880185c9bce8 ffff88021948c700
Call Trace:
 [<ffffffff81305cbd>] pci_disable_msix+0x3d/0x50
 [<ffffffffa00501d5>] ixgbe_reset_interrupt_capability+0x65/0x90 [ixgbe]
 [<ffffffffa00512f6>] ixgbe_clear_interrupt_scheme+0xb6/0xd0 [ixgbe]
 [<ffffffffa005330b>] __ixgbe_shutdown+0x5b/0x200 [ixgbe]
 [<ffffffffa00534ca>] ixgbe_shutdown+0x1a/0x60 [ixgbe]
 [<ffffffff812f6c7c>] pci_device_shutdown+0x2c/0x50
 [<ffffffff813727fb>] device_shutdown+0x4b/0x160
 [<ffffffff8107d98c>] kernel_restart_prepare+0x2c/0x40
 ehci timer_action, mod_timer io_watchdog
 [<ffffffff8107d9e6>] kernel_restart+0x16/0x60
 [<ffffffff8107dbfd>] sys_reboot+0x1ad/0x200
 [<ffffffff811676cf>] ? __d_free+0x3f/0x60
 [<ffffffff81167748>] ? d_free+0x58/0x60
 [<ffffffff8116f7c0>] ? mntput_no_expire+0x30/0x100
 [<ffffffff81152b11>] ? __fput+0x191/0x200
 [<ffffffff816565fe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b132>] system_call_fastpath+0x16/0x1b
Code: 4c 89 ef e8 98 8c e3 ff 4d 39 f4 48 8b 43 10 75 cf 48 83 c4 18 5b 41 5c
41 5d 41 5e 41 5f c9 c3 49 8b 7d 20 e8 07 5a d3 ff eb c9 <0f> 0b 0f 1f 00 eb fb
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
 ehci timer_action, mod_timer io_watchdog
RIP  [<ffffffff81305c2b>] free_msi_irqs+0x11b/0x130
 RSP <ffff880185c9bc88>
---[ end trace 27de882a0fe75593 ]---

(This was seen on a pretty old kernel/driver, but looks like
the same bug is still possible.)

Signed-off-by: <akepner@riverbed.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Thu, 18 Apr 2013 19:00:59 +0000 (15:00 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe only.

v2- Dropped the following 2 patches from the series:
 ixgbe: Support using build_skb in the case that jumbo frames are disabled
 ixgbe: walk pci-e bus to find minimum width

Ben Hutchings found a bug with Alex's patch, so that patch was dropped
permanently.  Jacob's "walk PCIe bus" patch is being re-worked for
a more generic solution so that other drivers can benefit.

In the remaining patches...
Alex provides a fix where we were incorrectly checking the entire frag_off
field when we only wanted the fragment offset.  Alex also cleans up
the check for PAGE_SIZE, since the default configuration allocates 32K
for all buffers.

Emil provides a change to the calculation of eerd so that it is consistent
between the read and write functions by using | instead of +.

Jacob adds support for displaying PCIe Gen3 link speed, which was
previously missing from the ixgbe driver.  He also provides a patch
to clean up ixgbe_get_bus_info_generic to call some conversion
functions, which are used also in another patch provided by Jacob.
Jacob modifies the driver to enable certain devices (which have an
internal switch) to read from the physical slot rather than reading
data from the internal switch.

Don provides a couple of fixes (which are more appropriate for net-next),
one of which resolves an issue where ixgbe was only turning on the laser
when the adapter was up which caused issues for those who wanted to
access the MNG firmware while the port was in a down state.  The other
fix is for WoL when currently linked at 1G.  Lastly Don bumps the driver
version keep the in-kernel driver up to date with the current functionality.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: introduce TCPSpuriousRtxHostQueues SNMP counter
Eric Dumazet [Thu, 18 Apr 2013 06:52:51 +0000 (06:52 +0000)]
tcp: introduce TCPSpuriousRtxHostQueues SNMP counter

Host queues (Qdisc + NIC) can hold packets so long that TCP can
eventually retransmit a packet before the first transmit even left
the host.

Its not clear right now if we could avoid this in the first place :

- We could arm RTO timer not at the time we enqueue packets, but
  at the time we TX complete them (tcp_wfree())

- Cancel the sending of the new copy of the packet if prior one
  is still in queue.

This patch adds instrumentation so that we can at least see how
often this problem happens.

TCPSpuriousRtxHostQueues SNMP counter is incremented every time
we detect the fast clone is not yet freed in tcp_transmit_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Remove unneeded asm header files
Fabio Estevam [Thu, 18 Apr 2013 02:54:39 +0000 (02:54 +0000)]
fec: Remove unneeded asm header files

There is nothing in the driver that requires <asm/coldfire.h> and
<asm/mcfsim.h>.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: bump version number
Don Skidmore [Fri, 1 Mar 2013 07:09:43 +0000 (07:09 +0000)]
ixgbe: bump version number

Bump the version number reflect the corresponding functionality in the
out of tree driver.

Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Fix 1G link WoL
Don Skidmore [Thu, 28 Feb 2013 08:08:44 +0000 (08:08 +0000)]
ixgbe: Fix 1G link WoL

We reset during the shutdown path which will reset AUTOC register.  This
would change LMS to 10G.  If we were currently linked at 1G we will lose
link, which is a bad thing if we wanted WoL to work.  For the fix I needed
to know if WoL is supported so I created a new bool in the ixgbe_hw struct.
If this is set we will not allow the reset to change the current LMS value
in AUTOC.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix MNG FW support when adapter not up
Don Skidmore [Thu, 21 Feb 2013 03:00:04 +0000 (03:00 +0000)]
ixgbe: fix MNG FW support when adapter not up

We were only turning the laser on when the adapter was up.  This
causes issues for those who wanted to access the MNG FW while the
port was in a down state.  This patch makes sure the laser is turned
on in probe and remain up even after the port is brought down.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: enable devices with internal switch to read pci parent
Jacob Keller [Tue, 9 Apr 2013 07:20:09 +0000 (07:20 +0000)]
ixgbe: enable devices with internal switch to read pci parent

This patch modifies the driver to enable certain devices, which have an internal
switch, to read data from the physical slot rather than reading data from the
internal switch. The internal switch will always report the same PCI width and
speed, which is not useful compared to knowing the width and speed of the slot
the physical card is plugged into.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: create conversion functions from link_status to bus/speed
Jacob Keller [Fri, 15 Feb 2013 09:18:15 +0000 (09:18 +0000)]
ixgbe: create conversion functions from link_status to bus/speed

This patch cleans up ixgbe_get_bus_info_generic to call some conversion
functions, which are used also in a follow on patch that needs to convert
between the link_status PCIe config values into ixgbe's internal enum
representations.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Enable support for recognizing PCI-e Gen3 link speed
Jacob Keller [Fri, 15 Feb 2013 09:18:10 +0000 (09:18 +0000)]
ixgbe: Enable support for recognizing PCI-e Gen3 link speed

This patch adds support for displaying PCIe Gen3 link speed, which was
previously missing from the driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Drop check for PAGE_SIZE from ixgbe_xmit_frame_ring
Alexander Duyck [Sat, 9 Feb 2013 01:19:55 +0000 (01:19 +0000)]
ixgbe: Drop check for PAGE_SIZE from ixgbe_xmit_frame_ring

The check for PAGE_SIZE is pointless now that the default configuration is to
allocate 32K for all buffers.  Since the Tx descriptor limit is 16K we can
just drop the check and always compare the descriptors to the maximum size
supported.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: don't do arithmetic operations on bitmasks
Emil Tantilov [Tue, 5 Feb 2013 09:43:26 +0000 (09:43 +0000)]
ixgbe: don't do arithmetic operations on bitmasks

Make the calculation of eerd consistent between the read and write functions
by using | instead of + for IXGBE_EEPROM_RW_REG_START

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Mask off check of frag_off as we only want fragment offset
Alexander Duyck [Fri, 1 Feb 2013 08:56:41 +0000 (08:56 +0000)]
ixgbe: Mask off check of frag_off as we only want fragment offset

We were incorrectly checking the entire frag_off field when we only wanted the
fragment offset.  As a result we were not pulling in TCP headers when the DNF
flag was set.

To correct that we will now check for frag off using the IP_OFFSET mask.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'tipc-ipoib'
David S. Miller [Wed, 17 Apr 2013 18:18:43 +0000 (14:18 -0400)]
Merge branch 'tipc-ipoib'

Patrick McHardy says:

====================
The following patchset adds support for running TIPC over InfiniBand.
The patchset consists of three parts (+ a minor fix for the ethernet media
type):

- Preparation: removal of an the unused str2addr callback and move of the
  bcast_addr from struct tipc_media to struct tipc_bearer. This is necessary
  because InfiniBand doesn't have a fixed broadcast address like ethernet,
  so it needs to be initialized with the device's broadcast address when
  the bearer is enabled

- Introduction of a TIPC InfiniBand media type. A new media type is needed
  to deal with the different address sizes

- Support for ETH_P_TIPC in IPoIB

Since the last posting I've addressed all feedback I received and rebased
to the current net-next tree.

I consider these patches ready for merging. Since they mainly affect TIPC
code, I'd propose to have them either go through the TIPC tree or through
Dave directly (not sure how TIPC patches are managed).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoIPoIB: add support for TIPC protocol
Patrick McHardy [Wed, 17 Apr 2013 06:18:29 +0000 (06:18 +0000)]
IPoIB: add support for TIPC protocol

Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own
neighbour entries and doesn't require the packet to have a dst_entry
anymore, the only necessary changes are to:

- not drop multicast TIPC packets because of the unknown ethernet type
- handle unicast TIPC packets similar to IPv4/IPv6 unicast packets

in ipoib_start_xmit().

An alternative would be to remove all ethertype limitations since they're
not necessary anymore, all TIPC needs to know about is ARP and RARP since
it wants to always perform "path find", even if a path is already known.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: add InfiniBand media type
Patrick McHardy [Wed, 17 Apr 2013 06:18:28 +0000 (06:18 +0000)]
tipc: add InfiniBand media type

Add InfiniBand media type based on the ethernet media type.

The only real difference is that in case of InfiniBand, we need the entire
20 bytes of space reserved for media addresses, so the TIPC media type ID is
not explicitly stored in the packet payload.

Sample output of tipc-config:

# tipc-config -v -addr -netid -nt=all -p -m -b -n -ls

node address: <10.1.4>
current network id: 4711
Type       Lower      Upper      Port Identity              Publication Scope
0          167776257  167776257  <10.1.1:1855512577>        1855512578  cluster
           167776260  167776260  <10.1.4:1216454657>        1216454658  zone
1          1          1          <10.1.4:1216479235>        1216479236  node
Ports:
1216479235: bound to {1,1}
1216454657: bound to {0,167776260}
Media:
eth
ib
Bearers:
ib:ib0
Nodes known:
<10.1.1>: up
Link <broadcast-link>
  Window:20 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 dups:0
  Congestion bearer:0 link:0  Send queue max:0 avg:0

Link <10.1.4:ib0-10.1.1:ib0>
  ACTIVE  MTU:2044  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:80 fragments:0/0 bundles:0/0
  TX packets:40 fragments:0/0 bundles:0/0
  TX profile sample:22 packets  average:54 octets
  0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
  RX states:410 probes:213 naks:0 defs:0 dups:0
  TX states:410 probes:197 naks:0 acks:0 dups:0
  Congestion bearer:0 link:0  Send queue max:1 avg:0

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: set skb->protocol in eth_media packet transmission
Patrick McHardy [Wed, 17 Apr 2013 06:18:27 +0000 (06:18 +0000)]
tipc: set skb->protocol in eth_media packet transmission

The skb->protocol field is used by packet classifiers and for AF_PACKET
cooked format, TIPC needs to set it properly.

Fixes packet classification and ethertype of 0x0000 in cooked captures:

Out 20:c9:d0:43:12:d9 ethertype Unknown (0x0000), length 56:
0x0000:  5b50 0028 0000 30d4 0100 1000 0100 1001  [P.(..0.........
0x0010:  0000 03e8 0000 0001 20c9 d043 12d9 0000  ...........C....
0x0020:  0000 0000 0000 0000                      ........

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: move bcast_addr from struct tipc_media to struct tipc_bearer
Patrick McHardy [Wed, 17 Apr 2013 06:18:26 +0000 (06:18 +0000)]
tipc: move bcast_addr from struct tipc_media to struct tipc_bearer

Some network protocols, like InfiniBand, don't have a fixed broadcast
address but one that depends on the configuration. Move the bcast_addr
to struct tipc_bearer and initialize it with the broadcast address of
the network device when the bearer is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: remove unused str2addr media callback
Patrick McHardy [Wed, 17 Apr 2013 06:18:25 +0000 (06:18 +0000)]
tipc: remove unused str2addr media callback

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cdc_ether: silence sparse __CHECK_ENDIAN__ warning
Bjørn Mork [Tue, 16 Apr 2013 22:12:13 +0000 (22:12 +0000)]
net: cdc_ether: silence sparse __CHECK_ENDIAN__ warning

Remove warning introduced by commit 418fc57 ("usbnet: cdc-ether: apply
usbnet_link_change"):

   CHECK   .../drivers/net/usb/cdc_ether.c
 .../drivers/net/usb/cdc_ether.c:409:46: warning: incorrect type in argument 2 (different base types)
 .../drivers/net/usb/cdc_ether.c:409:46:    expected bool [unsigned] [usertype] <noident>
 .../drivers/net/usb/cdc_ether.c:409:46:    got restricted __le16 [usertype] wValue

Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_ulpq: remove 'malloced' struct member
Daniel Borkmann [Tue, 16 Apr 2013 11:07:17 +0000 (11:07 +0000)]
net: sctp: sctp_ulpq: remove 'malloced' struct member

The structure sctp_ulpq is embedded into sctp_association and never
separately allocated, also ulpq->malloced is always 0, so that
kfree() is never called. Therefore, remove this code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_bind_addr: remove dead code
Daniel Borkmann [Tue, 16 Apr 2013 11:07:16 +0000 (11:07 +0000)]
net: sctp: sctp_bind_addr: remove dead code

The sctp_bind_addr structure has a 'malloced' member that is
always set to 0, thus in sctp_bind_addr_free() the kfree()
part can never be called. This part is embedded into
sctp_ep_common anyway and never alloced.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_transport: remove unused variable
Daniel Borkmann [Tue, 16 Apr 2013 11:07:15 +0000 (11:07 +0000)]
net: sctp: sctp_transport: remove unused variable

sctp_transport's member 'malloced' is set to 1, never evaluated
and the structure is kfreed anyway. So just remove it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: outqueue: simplify sctp_outq_uncork function
Daniel Borkmann [Tue, 16 Apr 2013 11:07:14 +0000 (11:07 +0000)]
net: sctp: outqueue: simplify sctp_outq_uncork function

Just a minor edit to simplify the function. No need for this
error variable here.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_outq: remove 'malloced' from its struct
Daniel Borkmann [Tue, 16 Apr 2013 11:07:12 +0000 (11:07 +0000)]
net: sctp: sctp_outq: remove 'malloced' from its struct

sctp_outq is embedded into sctp_association, and thus never
kmalloced in any way. Also, malloced is always 0, thus kfree()
is never called. Therefore, remove that dead piece of code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_inq: remove dead code
Daniel Borkmann [Tue, 16 Apr 2013 11:07:11 +0000 (11:07 +0000)]
net: sctp: sctp_inq: remove dead code

sctp_inq is never kmalloced, since it's integrated into sctp_ep_common
and only initialized from eps and assocs. Therefore, remove the dead
code from there.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_ssnmap: remove 'malloced' element from struct
Daniel Borkmann [Tue, 16 Apr 2013 11:07:10 +0000 (11:07 +0000)]
net: sctp: sctp_ssnmap: remove 'malloced' element from struct

sctp_ssnmap_init() can only be called from sctp_ssnmap_new()
where malloced is always set to 1. Thus, when we call
sctp_ssnmap_free() the test for map->malloced evaluates always
to true.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
David S. Miller [Wed, 17 Apr 2013 17:30:32 +0000 (13:30 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
A number of improvements for net-next/3.10.

Highlights include:

 * Properly exposing linux/openvswitch.h to userspace after the uapi
   changes.

 * Simplification of locking. It immediately makes things simpler to
   reason about and avoids holding RTNL mutex for longer than
   necessary. In the near future it will also enable tunnel
   registration and more fine-grained locking.

 * Miscellaneous cleanups and simplifications.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoatl1: Protect atl1_suspend with CONFIG_PM_SLEEP
Fabio Estevam [Tue, 16 Apr 2013 21:35:00 +0000 (18:35 -0300)]
atl1: Protect atl1_suspend with CONFIG_PM_SLEEP

commit 7b7a2bbb690 (atl1: Remove unneeded PM_OPS definitions) removed the
definition of atl1_suspend for the !CONFIG_PM_SLEEP case.

So only call atl1_suspend() when CONFIG_PM_SLEEP is defined and fix the
following build error from randconfig:

drivers/net/ethernet/atheros/atlx/atl1.c: In function 'atl1_shutdown':
drivers/net/ethernet/atheros/atlx/atl1.c:2888:2: error: implicit declaration of function 'atl1_suspend' [-Werror=implicit-function-declaration]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
David S. Miller [Tue, 16 Apr 2013 20:43:39 +0000 (16:43 -0400)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull-request for net-next/master. It consists of a patch by
Oliver Hartkopp. In this patch he cleans up the sja1000 header file by
using a common prefix for all sja1000 defines.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Use SIMPLE_DEV_PM_OPS
Fabio Estevam [Tue, 16 Apr 2013 08:17:46 +0000 (08:17 +0000)]
fec: Use SIMPLE_DEV_PM_OPS

Using SIMPLE_DEV_PM_OPS can make the code smaller and simpler.

Also change CONFIG_PM to CONFIG_PM_SLEEP.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopch_gbe: minor: report the actual error on MTU change
Veaceslav Falico [Tue, 16 Apr 2013 05:28:12 +0000 (05:28 +0000)]
pch_gbe: minor: report the actual error on MTU change

If we can't _up() after changing the MTU, report the actual error instead
of -ENOMEM. It can be really misleading cause pch_gbe is usually used in
scenarios where the memory amount is really small, and thus hiding the
real cause.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: Allow setting destination to unicast address.
Atzm Watanabe [Tue, 16 Apr 2013 02:50:52 +0000 (02:50 +0000)]
vxlan: Allow setting destination to unicast address.

This patch allows setting VXLAN destination to unicast address.
It allows that VXLAN can be used as peer-to-peer tunnel without
multicast.

v4: generalize struct vxlan_dev, "gaddr" is replaced with vxlan_rdst.
    "GROUP" attribute is replaced with "REMOTE".
    they are based by David Stevens's comments.

v3: move a new attribute REMOTE into the last of an enum list
    based by Stephen Hemminger's comments.

v2: use a new attribute REMOTE instead of GROUP based by
    Cong Wang's comments.

Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopacket: minor: add generic tpacket_uhdr to access packet headers
Daniel Borkmann [Tue, 16 Apr 2013 01:57:46 +0000 (01:57 +0000)]
packet: minor: add generic tpacket_uhdr to access packet headers

There is no need to add a dozen unions each time at the start
of the function. So, do this once and use it instead. Thus, we
can remove some duplicate code and make it more readable.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosctp: Add buffer utilization fields to /proc/net/sctp/assocs
Dilip Daya [Tue, 16 Apr 2013 01:39:07 +0000 (01:39 +0000)]
sctp: Add buffer utilization fields to /proc/net/sctp/assocs

sctp: Add buffer utilization fields to /proc/net/sctp/assocs

This patch adds the following fields to /proc/net/sctp/assocs output:

- sk->sk_wmem_alloc as "wmema" (transmit queue bytes committed)
- sk->sk_wmem_queued as "wmemq" (persistent queue size)
- sk->sk_sndbuf as "sndbuf" (size of send buffer in bytes)
- sk->sk_rcvbuf as "rcvbuf" (size of receive buffer in bytes)

When small DATA chunks containing 136 bytes data are sent the TX_QUEUE
(assoc->sndbuf_used) reaches a maximum of 40.9% of sk_sndbuf value when
peer.rwnd = 0. This was diagnosed from sk_wmem_alloc value reaching maximum
value of sk_sndbuf.

TX_QUEUE (assoc->sndbuf_used), sk_wmem_alloc and sk_wmem_queued values are
incremented in sctp_set_owner_w() for outgoing data chunks. Having access to
the above values in /proc/net/sctp/assocs will provide a better understanding
of SCTP buffer management.

With patch applied, example output when peer.rwnd = 0

where:
    ASSOC ffff880132298000 is sender
          ffff880125343000 is receiver

 ASSOC           SOCK            STY SST ST  HBKT ASSOC-ID TX_QUEUE RX_QUEUE \
ffff880132298000 ffff880124a0a0c0 2   1   3  29325    1      214656        0 \
ffff880125343000 ffff8801237d7700 2   1   3  36210    2           0   524520 \

UID   INODE LPORT  RPORT LADDRS <-> RADDRS       HBINT   INS  OUTS \
  0   25108 3455   3456  *10.4.8.3 <-> *10.5.8.3  7500     2     2 \
  0   27819 3456   3455  *10.5.8.3 <-> *10.4.8.3  7500     2     2 \

MAXRT T1X T2X RTXC   wmema   wmemq  sndbuf  rcvbuf
    4   0   0   72  525633  440320  524288  524288
    4   0   0    0       1       0  524288  524288

Signed-off-by: Dilip Daya <dilip.daya@hp.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotlan: cancel work at remove path
Devendra Naga [Tue, 16 Apr 2013 01:30:38 +0000 (01:30 +0000)]
tlan: cancel work at remove path

the work has been scheduled from interrupt, and not been
cancelled when the driver is unloaded, which doesn't remove
the work item from the global workqueue. call the
cancel_work_sync when the driver is removed (rmmod'ed).

Cc: Sriram <srk@ti.com>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: Vinay Hegde <vinay.hegde@ti.com>
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoneighbour: Convert NEIGH_PRINTK to neigh_dbg
Joe Perches [Mon, 15 Apr 2013 15:17:19 +0000 (15:17 +0000)]
neighbour: Convert NEIGH_PRINTK to neigh_dbg

Update debugging messages to a more current style.

Emit these debugging messages at KERN_DEBUG instead
of KERN_DEFAULT.

Add and use neigh_dbg(level, fmt, ...) macro
Add dynamic_debug capability via pr_debug
Convert embedded function names to "%s: ", __func__

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoat86rf230: change irq handling to prevent lockups with edge type irq
Sascha Herrmann [Sun, 14 Apr 2013 22:33:29 +0000 (22:33 +0000)]
at86rf230: change irq handling to prevent lockups with edge type irq

Implemented separate irq handling for edge and level type interrupt
configuration. For edge type interrupts calls to disable_irq_nosync()
and enable_irq() are removed. The at86rf230 resets the irq line only
after the irq status register is read. Disabling the irq can lock the
driver in situations where a irq is set by the radio while the driver
is still reading the frame buffer.

With irq_type configuration set to 0 the original behavior is
preserverd.

Additional the irq filter register is set to filter out all unused
interrupts and the irq status register is read in the probe
function to clear the irq line.

Signed-off-by: Sascha Herrmann <sascha@ps.nvbi.de>
Conflicts:
drivers/net/ieee802154/at86rf230.c
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoat86rf230: add irq type configuration option
Sascha Herrmann [Sun, 14 Apr 2013 22:33:28 +0000 (22:33 +0000)]
at86rf230: add irq type configuration option

Add option to at86rf230 platform data to configure the type of the
interrupt used by the driver. The irq polarity of the device will
be configured accordingly.

Signed-off-by: Sascha Herrmann <sascha@ps.nvbi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoks8851: Remove unneeded PM_OPS definitions
Fabio Estevam [Tue, 16 Apr 2013 09:28:31 +0000 (09:28 +0000)]
ks8851: Remove unneeded PM_OPS definitions

SIMPLE_DEV_PM_OPS macro can handle !CONFIG_PM_SLEEP case nicely, so there is no
need to define PM_OPS for both CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP cases.

Remove the unneeded definitions.

Cc: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoxgmac: Remove unneeded PM_OPS definitions
Fabio Estevam [Tue, 16 Apr 2013 09:28:30 +0000 (09:28 +0000)]
xgmac: Remove unneeded PM_OPS definitions

SIMPLE_DEV_PM_OPS macro can handle !CONFIG_PM_SLEEP case nicely, so there is no
need to define PM_OPS for both CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP cases.

Remove the unneeded definitions.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Remove unneeded PM_OPS definitions
Fabio Estevam [Tue, 16 Apr 2013 09:28:29 +0000 (09:28 +0000)]
tg3: Remove unneeded PM_OPS definitions

SIMPLE_DEV_PM_OPS macro can handle !CONFIG_PM_SLEEP case nicely, so there is no
need to define PM_OPS for both CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP cases.

Remove the unneeded definitions.

Cc: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoatl1: Remove unneeded PM_OPS definitions
Fabio Estevam [Tue, 16 Apr 2013 09:28:28 +0000 (09:28 +0000)]
atl1: Remove unneeded PM_OPS definitions

SIMPLE_DEV_PM_OPS macro can handle !CONFIG_PM_SLEEP case nicely, so there is no
need to define PM_OPS for both CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP cases.

Remove the unneeded definitions.

Cc: Jay Cliburn <jcliburn@gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocan: mcp251x: Remove unneeded PM_OPS definitions
Fabio Estevam [Tue, 16 Apr 2013 09:28:27 +0000 (09:28 +0000)]
can: mcp251x: Remove unneeded PM_OPS definitions

SIMPLE_DEV_PM_OPS macro can handle !CONFIG_PM_SLEEP case nicely, so there is no
need to define PM_OPS for both CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP cases.

Remove the unneeded definitions.

Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocan: sja1000: use common prefix for all sja1000 defines
Oliver Hartkopp [Sat, 13 Apr 2013 19:35:49 +0000 (21:35 +0200)]
can: sja1000: use common prefix for all sja1000 defines

This is a follow up patch to:

    f901b6b can: sja1000: fix define conflict on SH

That patch fixed a define conflict between the SH architecture and the sja1000
driver, by addind a prefix to one macro only. This patch consistently renames
the prefix of the SJA1000 controller registers from "REG_" to "SJA1000_".

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agoopenvswitch: Use generic struct pcpu_tstats.
Pravin B Shelar [Mon, 15 Apr 2013 20:30:37 +0000 (13:30 -0700)]
openvswitch: Use generic struct pcpu_tstats.

Rather than defining ovs specific stats struct (vport_percpu_stats),
we can use existing pcpu_tstats to achieve exactly same functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Simplify datapath locking.
Pravin B Shelar [Mon, 15 Apr 2013 20:23:03 +0000 (13:23 -0700)]
openvswitch: Simplify datapath locking.

Currently OVS uses combination of genl and rtnl lock to protect
datapath state.  This was done due to networking stack locking.
But this has complicated locking and there are few lock ordering
issues with new tunneling protocols.
Following patch simplifies locking by introducing new ovs mutex
and now this lock is used to protect entire ovs state.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoMerge branch 'sync_multiple'
David S. Miller [Mon, 15 Apr 2013 20:10:53 +0000 (16:10 -0400)]
Merge branch 'sync_multiple'

Vlad Yasevich says:

====================
Current dev_[uc|mc]_addr_sync() API currently correctly syncs the
addresses to the first device.  Any subsequent calls to sync will
not do anything since the synched variable will be set.  This
variable is used as an optimization to skip over addresses that have
been synched.

There are some devices (ex: team) that attempt to do the above.  There
is other work in progress that needs to above to work corretly.

The short series introduces dev_[uc|mc]_addr_synch_multiple() that
allows multiple calls to sync to multiple different devices.  Original
API is left alone and still has the limitation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoteam: Use new sync_multiple api to sync devices adressess.
Vlad Yasevich [Mon, 15 Apr 2013 09:54:26 +0000 (09:54 +0000)]
team: Use new sync_multiple api to sync devices adressess.

Team drivers attempts to sync addresses to each of the port
devices; however, the current api doesn't really perform the sync
for any device after the first one.  Switch to using the new api
that will actually sync the addresses to all ports.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
Vlad Yasevich [Mon, 15 Apr 2013 09:54:25 +0000 (09:54 +0000)]
net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api

The current implementation of dev_uc_sync/unsync() assumes that there is
a strict 1-to-1 relationship between the source and destination of the sync.
In other words, once an address has been synced to a destination device, it
will not be synced to any other device through the sync API.
However, there are some virtual devices that aggreate a number of lower
devices and need to sync addresses to all of them.  The current
API falls short there.

This patch introduces a new dev_uc_sync_multiple() api that can be called
in the above circumstances and allows sync to work for every invocation.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: minor: make sctp_ep_common's member 'dead' a bool
Daniel Borkmann [Mon, 15 Apr 2013 03:27:18 +0000 (03:27 +0000)]
net: sctp: minor: make sctp_ep_common's member 'dead' a bool

Since dead only holds two states (0,1), make it a bool instead
of a 'char', which is more appropriate for its purpose.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: remove sctp_ep_common struct member 'malloced'
Daniel Borkmann [Mon, 15 Apr 2013 03:27:17 +0000 (03:27 +0000)]
net: sctp: remove sctp_ep_common struct member 'malloced'

There is actually no need to keep this member in the structure, because
after init it's always 1 anyway, thus always kfree called. This seems to
be an ancient leftover from the very initial implementation from 2.5
times. Only in case the initialization of an association fails, we leave
base.malloced as 0, but we nevertheless kfree it in the error path in
sctp_association_new().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosis900: check for DMA map errors
Denis Kirjanov [Sun, 14 Apr 2013 21:11:29 +0000 (21:11 +0000)]
sis900: check for DMA map errors

The first backtrace appears on tx path with DMA mapping operations debug
enabled.

[  345.637919] ------------[ cut here ]------------
[  345.637971] WARNING: at lib/dma-debug.c:937 check_unmap+0x4df/0x910()
[  345.637977] Hardware name: System Name
[  345.637987] sis900 0000:00:01.1: DMA-API: device driver failed to check map error[device address=0x000000000d4aed02] [si
ze=60 bytes] [mapped as single]
[  345.637993] Modules linked in: bridge stp llc dmfe sundance 3c59x sis900
[  345.638022] Pid: 0, comm: swapper Not tainted 3.9.0-rc6+ #4
[  345.638028] Call Trace:
[  345.638042]  [<c122097f>] ? check_unmap+0x4df/0x910
[  345.638059]  [<c102b19c>] warn_slowpath_common+0x7c/0xa0
[  345.638070]  [<c122097f>] ? check_unmap+0x4df/0x910
[  345.638081]  [<c102b23e>] warn_slowpath_fmt+0x2e/0x30
[  345.638092]  [<c122097f>] check_unmap+0x4df/0x910
[  345.638107]  [<c100bfeb>] ? save_stack_trace+0x2b/0x50
[  345.638120]  [<c107238e>] ? mark_lock+0x31e/0x5d0
[  345.638132]  [<c1072b2c>] ? __lock_acquire+0x4ec/0x7d0
[  345.638143]  [<c1220f6d>] debug_dma_unmap_page+0x6d/0x80
[  345.638166]  [<cf834dec>] sis900_interrupt+0x49c/0x860 [sis900]
[  345.638195]  [<c1094b73>] handle_irq_event_percpu+0x43/0x1c0
[  345.638206]  [<c1094d1e>] ? handle_irq_event+0x2e/0x60
[  345.638217]  [<c1094d27>] handle_irq_event+0x37/0x60
[  345.638235]  [<c10973f0>] ? irq_set_chip_data+0x40/0x40
[  345.638246]  [<c1097442>] handle_level_irq+0x52/0xa0
[  345.638251]  <IRQ>  [<c1003629>] ? do_IRQ+0x39/0xa0
[  345.638293]  [<c1484631>] ? common_interrupt+0x31/0x36
[  345.638347]  [<d08c2c52>] ? br_flood_forward+0x12/0x20 [bridge]
[  345.638364]  [<d08c2d40>] ? br_dev_queue_push_xmit+0x60/0x60 [bridge]
[  345.638381]  [<d08c3b2b>] ? br_handle_frame_finish+0x25b/0x280 [bridge]
[  345.638399]  [<d08c3ce3>] ? br_handle_frame+0x193/0x290 [bridge]
[  345.638416]  [<d08c3b50>] ? br_handle_frame_finish+0x280/0x280 [bridge]
[  345.638431]  [<c13b3c87>] ? __netif_receive_skb_core+0x1d7/0x710
[  345.638442]  [<c13b3b19>] ? __netif_receive_skb_core+0x69/0x710
[  345.638454]  [<c13b41e1>] ? __netif_receive_skb+0x21/0x70
[  345.638464]  [<c13b42b5>] ? process_backlog+0x85/0x130
[  345.638476]  [<c13b4bbb>] ? net_rx_action+0xfb/0x1d0
[  345.638497]  [<c1032768>] ? __do_softirq+0xa8/0x1f0
[  345.638527]  [<c147daad>] ? _raw_spin_unlock+0x1d/0x20
[  345.638538]  [<c10038c0>] ? handle_irq+0x20/0xd0
[  345.638550]  [<c1032f27>] ? irq_exit+0x97/0xa0
[  345.638560]  [<c1003632>] ? do_IRQ+0x42/0xa0
[  345.638580]  [<c104d003>] ? hrtimer_start+0x23/0x30
[  345.638580]  [<c1484631>] ? common_interrupt+0x31/0x36
[  345.638580]  [<c1008703>] ? default_idle+0x33/0xc0
[  345.638580]  [<c10086ac>] ? cpu_idle+0x4c/0x70
[  345.638580]  [<c14787e0>] ? rest_init+0xa0/0xb0
[  345.638580]  [<c1478740>] ? reciprocal_value+0x50/0x50
[  345.638580]  [<c16b5bcf>] ? start_kernel+0x28f/0x320
[  345.638580]  [<c16b54e0>] ? repair_env_string+0x60/0x60
[  345.638580]  [<c16b5269>] ? i386_start_kernel+0x39/0xa0
[  345.638580] ---[ end trace a244264b69b8a7ae ]---
[  345.638580] Mapped at:
[  345.638580]  [<c1221c65>] debug_dma_map_page+0x65/0x110
[  345.638580]  [<cf8355a9>] sis900_start_xmit+0x129/0x210 [sis900]
[  345.638580]  [<c13b2527>] dev_hard_start_xmit+0x1b7/0x530
[  345.638580]  [<c13cc32e>] sch_direct_xmit+0x8e/0x280
[  345.638580]  [<c13b4e39>] dev_queue_xmit+0x1a9/0x5b0

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: fix error return code in macb_probe()
Nicolas Ferre [Sun, 14 Apr 2013 22:04:33 +0000 (22:04 +0000)]
net/macb: fix error return code in macb_probe()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Original-idea-by: <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: don't bypass encapsulation for multi- and broadcasts
Mike Rapoport [Sat, 13 Apr 2013 23:21:51 +0000 (23:21 +0000)]
vxlan: don't bypass encapsulation for multi- and broadcasts

The multicast and broadcast packets may have RTCF_LOCAL set in rt_flags
and therefore will be sent out bypassing encapsulation. This breaks
delivery of packets sent to the vxlan multicast group.
Disabling encapsulation bypass for multicasts and broadcasts fixes the
issue.

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Tested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Tested-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_memcontrol: minor: remove unused variable
Daniel Borkmann [Sun, 14 Apr 2013 08:29:12 +0000 (08:29 +0000)]
net: tcp_memcontrol: minor: remove unused variable

Commit 10b96f7306e5 (``tcp_memcontrol: remove a redundant statement
in tcp_destroy_cgroup()'') says ``We read the value but make no use
of it.'', but forgot to remove the variable declaration as well. This
was a follow-up commit of 3f1346193 (``memcg: decrement static keys
at real destroy time'') that removed the read of variable 'val'.

This fixes therefore:

  CC      net/ipv4/tcp_memcontrol.o
net/ipv4/tcp_memcontrol.c: In function ‘tcp_destroy_cgroup’:
net/ipv4/tcp_memcontrol.c:67:6: warning: unused variable ‘val’ [-Wunused-variable]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sock: make sock_tx_timestamp void
Daniel Borkmann [Sun, 14 Apr 2013 08:08:13 +0000 (08:08 +0000)]
net: sock: make sock_tx_timestamp void

Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353be, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: use htonl when snooping for loopback address
Mike Rapoport [Sat, 13 Apr 2013 23:21:39 +0000 (23:21 +0000)]
vxlan: use htonl when snooping for loopback address

Currently "bridge fdb show dev vxlan0" lists loopback address as
"1.0.0.127". Using htonl(INADDR_LOOPBACK) rather than passing it
directly to vxlan_snoop fixes the problem.

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>