GitHub/moto-9609/android_kernel_motorola_exynos9610.git
12 years agovhost-net: reduce vq polling on tx zerocopy
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:55 +0000 (09:16 +0000)]
vhost-net: reduce vq polling on tx zerocopy

It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7a9f67add11ad3161096fb95233aa2e.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovhost-net: select tx zero copy dynamically
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:51 +0000 (09:16 +0000)]
vhost-net: select tx zero copy dynamically

Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovhost: move -net specific code out
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:46 +0000 (09:16 +0000)]
vhost: move -net specific code out

Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovhost: track zero copy failures using DMA length
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:42 +0000 (09:16 +0000)]
vhost: track zero copy failures using DMA length

This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovhost-net: cleanup macros for DMA status tracking
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:37 +0000 (09:16 +0000)]
vhost-net: cleanup macros for DMA status tracking

Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotun: report orphan frags errors to zero copy callback
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:32 +0000 (09:16 +0000)]
tun: report orphan frags errors to zero copy callback

When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoskb: api to report errors for zero copy skbs
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:28 +0000 (09:16 +0000)]
skb: api to report errors for zero copy skbs

Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoskb: report completion status for zero copy skbs
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:22 +0000 (09:16 +0000)]
skb: report completion status for zero copy skbs

Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Fri, 2 Nov 2012 22:45:35 +0000 (18:45 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to igb, ixgbe and e1000.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovlan: use IS_ENABLED()
Amerigo Wang [Mon, 29 Oct 2012 17:22:28 +0000 (17:22 +0000)]
vlan: use IS_ENABLED()

#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

can be replaced by

#if IS_ENABLED(CONFIG_FOO)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoipv6: use IS_ENABLED()
Amerigo Wang [Mon, 29 Oct 2012 16:23:10 +0000 (16:23 +0000)]
ipv6: use IS_ENABLED()

#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

can be replaced by

#if IS_ENABLED(CONFIG_FOO)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agortnl/ipv4: use netconf msg to advertise rp_filter status
Nicolas Dichtel [Mon, 29 Oct 2012 04:53:27 +0000 (04:53 +0000)]
rtnl/ipv4: use netconf msg to advertise rp_filter status

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoppp: make ppp_get_stats64 static
stephen hemminger [Mon, 29 Oct 2012 08:34:02 +0000 (08:34 +0000)]
ppp: make ppp_get_stats64 static

This was picked up by sparse.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoFEC: Add time stamping code and a PTP hardware clock
Frank Li [Tue, 30 Oct 2012 18:25:31 +0000 (18:25 +0000)]
FEC: Add time stamping code and a PTP hardware clock

This patch adds a driver for the FEC(MX6) that offers time
stamping and a PTP haderware clock. Because FEC\ENET(MX6)
hardware frequency adjustment is complex, we have implemented
this in software by changing the multiplication factor of the
timecounter.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoARM: imx6q: Set enet tx reference clk from anatop to support 1588
Frank Li [Tue, 30 Oct 2012 18:25:22 +0000 (18:25 +0000)]
ARM: imx6q: Set enet tx reference clk from anatop to support 1588

Set GRP1 BIT21 ENET_CLK_SEL:
  Enet tx reference clk from internal clock from anatop
  (loopback through pad), this clock also sent out to external PHY

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoARM: dts: imx6q: Add ENET PTP clock pin and clock source
Frank Li [Tue, 30 Oct 2012 18:24:57 +0000 (18:24 +0000)]
ARM: dts: imx6q: Add ENET PTP clock pin and clock source

Add ENET 1588 clock input pin
MX6Q_PAD_GPIO_16__ENET_ANATOP_ETHERNET_REF_OUT
and anatop PLL8 clock source for ENET

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: fec: move fec_enet_private to header file
Frank Li [Tue, 30 Oct 2012 18:24:49 +0000 (18:24 +0000)]
net: fec: move fec_enet_private to header file

A new file fec_ptp.c will use fec_enet_private to support 1588 PTP
move such structure to common header file fec.h

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoveth: allow changing the mac address while interface is up
Hannes Frederic Sowa [Tue, 30 Oct 2012 16:22:01 +0000 (16:22 +0000)]
veth: allow changing the mac address while interface is up

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: support the HWTSTAMP ioctl and the CPTS
Richard Cochran [Mon, 29 Oct 2012 08:45:20 +0000 (08:45 +0000)]
cpsw: support the HWTSTAMP ioctl and the CPTS

This patch hooks into the CPTS code and adds support for the HWTSTAMP
ioctl. The patch includes code for the CPSW version found in the dm814x
even though the background device tree support for this board is still
missing.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpts: specify the input clock frequency via DT
Richard Cochran [Mon, 29 Oct 2012 08:45:19 +0000 (08:45 +0000)]
cpts: specify the input clock frequency via DT

This patch adds a way to configure the CPTS input clock scaling factors
via the device tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: add a DT field for the active time stamping port
Richard Cochran [Mon, 29 Oct 2012 08:45:18 +0000 (08:45 +0000)]
cpsw: add a DT field for the active time stamping port

Because time stamping on both external ports of the switch simultaneously
is positively useless from the application's point of view, this patch
provides a DT configuration method to choose the active port.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: add a DT field for the cpts offset
Richard Cochran [Mon, 29 Oct 2012 08:45:17 +0000 (08:45 +0000)]
cpsw: add a DT field for the cpts offset

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpts: introduce time stamping code and a PTP hardware clock.
Richard Cochran [Mon, 29 Oct 2012 08:45:16 +0000 (08:45 +0000)]
cpts: introduce time stamping code and a PTP hardware clock.

This patch adds a driver for the CPTS that offers time
stamping and a PTP hardware clock. Because some of the
CPTS hardware variants (like the am335x) do not support
frequency adjustment, we have implemented this in software
by changing the multiplication factor of the timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: support both silicon versions
Richard Cochran [Mon, 29 Oct 2012 08:45:15 +0000 (08:45 +0000)]
cpsw: support both silicon versions

This patch fixes the cpsw driver to operate correctly with both the
dm814x and the am335x versions of the switch hardware.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: remember the silicon version
Richard Cochran [Mon, 29 Oct 2012 08:45:14 +0000 (08:45 +0000)]
cpsw: remember the silicon version

This patch lets the CPSW driver remember the version number in order to
support the two different variants already in the wild.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: add missing fields to the CPSW_SS register bank.
Richard Cochran [Mon, 29 Oct 2012 08:45:13 +0000 (08:45 +0000)]
cpsw: add missing fields to the CPSW_SS register bank.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocpsw: rename register banks to match the reference manual
Richard Cochran [Mon, 29 Oct 2012 08:45:12 +0000 (08:45 +0000)]
cpsw: rename register banks to match the reference manual

The code mixes up the CPSW_SS and the CPSW_WR register naming. This patch
changes the names to conform to the published Technical Reference Manual
from TI, in order to make working on the code less confusing.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers: net: ethernet: cpsw: add multicast address to ALE table
Mugunthan V N [Mon, 29 Oct 2012 08:45:11 +0000 (08:45 +0000)]
drivers: net: ethernet: cpsw: add multicast address to ALE table

Adding multicast address to ALE table via netdev ops to subscribe, transmit
or receive multicast frames to and from the network

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: add pinctrl consumer support
Jean-Christophe PLAGNIOL-VILLARD [Wed, 31 Oct 2012 06:04:59 +0000 (06:04 +0000)]
net/macb: add pinctrl consumer support

If no pinctrl available just report a warning as some architecture may not
need to do anything.

Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
[nicolas.ferre@atmel.com: adapt the error path, remove unneeded headers]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: Offset first RX buffer by two bytes
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:58 +0000 (06:04 +0000)]
net/macb: Offset first RX buffer by two bytes

Make the ethernet frame payload word-aligned, possibly making the
memcpy into the skb a bit faster. This will be even more important
after we eliminate the copy altogether.

Also eliminate the redundant RX_OFFSET constant -- it has the same
definition and purpose as NET_IP_ALIGN.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: better manage tx errors
Nicolas Ferre [Wed, 31 Oct 2012 06:04:57 +0000 (06:04 +0000)]
net/macb: better manage tx errors

Handle all TX errors, not only underruns. TX error management is
deferred to a dedicated workqueue.
Reinitialize the TX ring after treating all remaining frames, and
restart the controller when everything has been cleaned up properly.
Napi is not stopped during this task as the driver only handles
napi for RX for now.
With this sequence, we do not need a special check during the xmit
method as the packets will be caught by TX disable during workqueue
execution.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: ethtool interface: add register dump feature
Nicolas Ferre [Wed, 31 Oct 2012 06:04:56 +0000 (06:04 +0000)]
net/macb: ethtool interface: add register dump feature

Add macb_get_regs() ethtool function and its helper function:
macb_get_regs_len().
The version field is deduced from the IP revision which gives the
"MACB or GEM" information. An additional version field is reserved.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: clean up ring buffer logic
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:55 +0000 (06:04 +0000)]
net/macb: clean up ring buffer logic

Instead of masking head and tail every time we increment them, just let them
wrap through UINT_MAX and mask them when subscripting. Add simple accessor
functions to do the subscripting properly to minimize the chances of messing
this up.

This makes the code slightly smaller, and hopefully faster as well.  Also,
doing the ring buffer management this way will simplify things a lot when
making the ring sizes configurable in the future.

Available number of descriptors in ring buffer function by David Laight.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics, adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: tx status is more than 8 bits now
Nicolas Ferre [Wed, 31 Oct 2012 06:04:54 +0000 (06:04 +0000)]
net/macb: tx status is more than 8 bits now

On some revision of GEM, TSR status register has more information.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: remove macb_get_drvinfo()
Nicolas Ferre [Wed, 31 Oct 2012 06:04:53 +0000 (06:04 +0000)]
net/macb: remove macb_get_drvinfo()

This function has little meaning so remove it altogether and
let ethtool core fill in the fields automatically.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: change debugging messages
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:52 +0000 (06:04 +0000)]
net/macb: change debugging messages

Convert some noisy netdev_dbg() statements to netdev_vdbg(). Defining
DEBUG will no longer fill up the logs; VERBOSE_DEBUG still does.
Add one more verbose debug for ISR status.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics, add ISR status]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: memory barriers cleanup
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:51 +0000 (06:04 +0000)]
net/macb: memory barriers cleanup

Remove a couple of unneeded barriers and document the remaining ones.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/macb: Add support for Gigabit Ethernet mode
Patrice Vilchez [Wed, 31 Oct 2012 06:04:50 +0000 (06:04 +0000)]
net/macb: Add support for Gigabit Ethernet mode

Add Gigabit Ethernet mode to GEM cadence IP and enable RGMII connection.

Signed-off-by: Patrice Vilchez <patrice.vilchez@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotime: remove the timecompare code.
Richard Cochran [Wed, 31 Oct 2012 06:27:25 +0000 (06:27 +0000)]
time: remove the timecompare code.

This patch removes the timecompare code from the kernel. The top five
reasons to do this are:

1. There are no more users of this code.
2. The original idea was a bit weak.
3. The original author has disappeared.
4. The code was not general purpose but tuned to a particular hardware,
5. There are better ways to accomplish clock synchronization.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobfin_mac: offer a PTP Hardware Clock.
Richard Cochran [Wed, 31 Oct 2012 06:27:24 +0000 (06:27 +0000)]
bfin_mac: offer a PTP Hardware Clock.

The BF518 has a PTP time unit that works in a similar way to other MAC
based clocks, like gianfar, ixp46x, and igb. This patch adds support for
using the blackfin as a PHC. Although the blackfin hardware does offer a
few ancillary features, this patch implements only the basic operations.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobfin_mac: replace sys time stamps with raw ones instead.
Richard Cochran [Wed, 31 Oct 2012 06:27:23 +0000 (06:27 +0000)]
bfin_mac: replace sys time stamps with raw ones instead.

This patch replaces the sys time stamps and timecompare code with simple
raw hardware time stamps in nanosecond resolution. The only tricky bit is
to find a PTP Hardware Clock period slower than the input clock period
and a power of two.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobfin_mac: only advertise hardware time stamped when enabled.
Richard Cochran [Wed, 31 Oct 2012 06:27:22 +0000 (06:27 +0000)]
bfin_mac: only advertise hardware time stamped when enabled.

The hardware time stamping code is a compile time option for the blackfin.
When it is not enabled, the driver should fall back to the standard
ethtool reply to the get_ts_info query.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoptp: add an ioctl to compare PHC time with system time
Richard Cochran [Wed, 31 Oct 2012 06:19:07 +0000 (06:19 +0000)]
ptp: add an ioctl to compare PHC time with system time

This patch adds an ioctl for PTP Hardware Clock (PHC) devices that allows
user space to measure the time offset between the PHC and the system
clock. Rather than hard coding any kind of estimation algorithm into the
kernel, this patch takes the more flexible approach of just delivering
an array of raw clock readings. In that way, the user space clock servo
may be adapted to new and different hardware clocks.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoptp: Enable clock drivers along with associated net/PHY drivers
Ben Hutchings [Wed, 31 Oct 2012 15:33:52 +0000 (15:33 +0000)]
ptp: Enable clock drivers along with associated net/PHY drivers

Where a PTP clock driver is associated with a net or PHY driver, it
should be enabled automatically whenever that driver is enabled.
Therefore:

- Make PTP clock drivers select rather than depending on PTP_1588_CLOCK
- Remove separate boolean options for PTP clock drivers that are built
  as part of net driver modules.  (This also fixes cases where the PTP
  subsystem is wrongly forced to be built-in.)
- Set 'default y' for PTP clock drivers that depend on specific net
  drivers but are built separately

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoptp: Make PTP_1588_CLOCK select rather than depend on PPS
Ben Hutchings [Wed, 31 Oct 2012 15:32:44 +0000 (15:32 +0000)]
ptp: Make PTP_1588_CLOCK select rather than depend on PPS

PTP hardware clock drivers that select PTP_1588_CLOCK must currently
also select PPS.  For those drivers that don't, the user must enable
PPS, then enable PTP_1588_CLOCK, then the driver.  Simplify things for
developers and users by putting this selection in one place.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopps, ptp: Remove dependencies on EXPERIMENTAL
Ben Hutchings [Wed, 31 Oct 2012 15:31:29 +0000 (15:31 +0000)]
pps, ptp: Remove dependencies on EXPERIMENTAL

These are now established subsystems, and we want drivers to be able
to select PPS and PTP_1588_CLOCK without depending on EXPERIMENTAL.
Further, the use of EXPERIMENTAL is now deprecated in general.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosk-filter: Add ability to get socket filter program (v2)
Pavel Emelyanov [Thu, 1 Nov 2012 02:01:48 +0000 (02:01 +0000)]
sk-filter: Add ability to get socket filter program (v2)

The SO_ATTACH_FILTER option is set only. I propose to add the get
ability by using SO_ATTACH_FILTER in getsockopt. To be less
irritating to eyes the SO_GET_FILTER alias to it is declared. This
ability is required by checkpoint-restore project to be able to
save full state of a socket.

There are two issues with getting filter back.

First, kernel modifies the sock_filter->code on filter load, thus in
order to return the filter element back to user we have to decode it
into user-visible constants. Fortunately the modification in question
is interconvertible.

Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
speed up the run-time division by doing kernel_k = reciprocal(user_k).
Bad news is that different user_k may result in same kernel_k, so we
can't get the original user_k back. Good news is that we don't have
to do it. What we need to is calculate a user2_k so, that

  reciprocal(user2_k) == reciprocal(user_k) == kernel_k

i.e. if it's re-loaded back the compiled again value will be exactly
the same as it was. That said, the user2_k can be calculated like this

  user2_k = reciprocal(kernel_k)

with an exception, that if kernel_k == 0, then user2_k == 1.

The optlen argument is treated like this -- when zero, kernel returns
the amount of sock_fprog elements in filter, otherwise it should be
large enough for the sock_fprog array.

changes since v1:
* Declared SO_GET_FILTER in all arch headers
* Added decode of vlan-tag codes

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: choose the txq based on rxq
Jason Wang [Wed, 31 Oct 2012 19:46:02 +0000 (19:46 +0000)]
tuntap: choose the txq based on rxq

This patch implements a simple multiqueue flow steering policy - tx follows rx
for tun/tap. The idea is simple, it just choose the txq based on which rxq it
comes. The flow were identified through the rxhash of a skb, and the hash to
queue mapping were recorded in a hlist with an ageing timer to retire the
mapping. The mapping were created when tun receives packet from userspace, and
was quired in .ndo_select_queue().

I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
perf top, so the overhead could be negelected.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: add ioctl to attach or detach a file form tuntap device
Jason Wang [Wed, 31 Oct 2012 19:46:01 +0000 (19:46 +0000)]
tuntap: add ioctl to attach or detach a file form tuntap device

Sometimes usespace may need to active/deactive a queue, this could be done by
detaching and attaching a file from tuntap device.

This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
IFF_DETACH_QUEUE were introduced to do the detaching.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: multiqueue support
Jason Wang [Wed, 31 Oct 2012 19:46:00 +0000 (19:46 +0000)]
tuntap: multiqueue support

This patch converts tun/tap to a multiqueue devices and expose the multiqueue
queues as multiple file descriptors to userspace. Internally, each tun_file were
abstracted as a queue, and an array of pointers to tun_file structurs were
stored in tun_structure device, so multiple tun_files were allowed to be
attached to the device as multiple queues.

When choosing txq, we first try to identify a flow through its rxhash, if it
does not have such one, we could try recorded rxq and then use them to choose
the transmit queue. This policy may be changed in the future.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: introduce multiqueue flags
Jason Wang [Wed, 31 Oct 2012 19:45:59 +0000 (19:45 +0000)]
tuntap: introduce multiqueue flags

Add flags to be used by creating multiqueue tuntap device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: RCUify dereferencing between tun_struct and tun_file
Jason Wang [Wed, 31 Oct 2012 19:45:58 +0000 (19:45 +0000)]
tuntap: RCUify dereferencing between tun_struct and tun_file

RCU were introduced in this patch to synchronize the dereferences between
tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
dereference from one to other must be done under rtnl lock or rcu read critical
region.

This is needed for the following patches since the one of the goal of multiqueue
tuntap is to allow adding or removing queues during workload. Without RCU,
control path would hold tx locks when adding or removing queues (which may cause
sme delay) and it's hard to change the number of queues without stopping the net
device. With the help of rcu, there's also no need for tun_file hold an refcnt
to tun_struct.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: move socket to tun_file
Jason Wang [Wed, 31 Oct 2012 19:45:57 +0000 (19:45 +0000)]
tuntap: move socket to tun_file

Current tuntap makes use of the socket receive queue as its tx queue. To
implement multiple tx queues for tuntap and enable the ability of adding and
removing queues during workload, the first step is to move the socket related
structures to tun_file. Then we could let multiple fds/sockets to be attached to
the tuntap.

This patch removes tun_sock and moves socket related structures from tun_sock or
tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
still kept in tun_structure since they are used to filter packets for the net
device instead of per transmit queue (at least I see no requirements for
them). After those changes, socket were created and destroyed during file open
and close (instead of device creation and destroy), the socket structures could
be dereferenced from tun_file instead of the file of tun_struct structure
itself.

For persisent device, since we purge during datching and wouldn't queue any
packets when no interface were attached, there's no behaviod changes before and
after this patch, so the changes were transparent to the userspace. To keep the
attributes such as sndbuf, socket filter and vnet header, those would be
re-initialize after a new interface were attached to an persist device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotuntap: log the unsigned informaiton with %u
Jason Wang [Wed, 31 Oct 2012 19:45:56 +0000 (19:45 +0000)]
tuntap: log the unsigned informaiton with %u

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoe1000: fix concurrent accesses to PHY from watchdog and ethtool
Maxime Bizon [Sat, 20 Oct 2012 14:53:40 +0000 (14:53 +0000)]
e1000: fix concurrent accesses to PHY from watchdog and ethtool

The e1000 driver currently does not protect concurrent accesses to the PHY
from both the ethtool callbacks, and from the e1000_watchdog function. This
patchs adds a new spinlock which is used by e1000_{read,write}_phy_reg in
order to serialize concurrent accesses to the PHY.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Fix EEPROM writes via ethtool on i210
Carolyn Wyborny [Wed, 24 Oct 2012 03:56:21 +0000 (03:56 +0000)]
igb: Fix EEPROM writes via ethtool on i210

This patch fixes a problem where the driver would crash when trying to
write a word to the EEPROM on i210 devices.

Reported-by: Ekman Tsang <Ekman.Tsang@riverbed.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Add function to read i211's invm version
Carolyn Wyborny [Tue, 23 Oct 2012 13:04:37 +0000 (13:04 +0000)]
igb: Add function to read i211's invm version

The i211's one-time programmable (invm) version field is different than the
other fields contained in it.  This patch adds a function to get the invm version
of it and store it for output from ethtool.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Remove workaround for EEE configuration on i210/I211
Carolyn Wyborny [Fri, 19 Oct 2012 05:31:43 +0000 (05:31 +0000)]
igb: Remove workaround for EEE configuration on i210/I211

This patch removes a workaround that was needed on pre-release hardware.
Released hardware should not have this setting, but any devices that do
will get a warning message instead.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: fix default setting of TXDCTL.WTHRESH
Emil Tantilov [Wed, 24 Oct 2012 08:12:10 +0000 (08:12 +0000)]
ixgbe: fix default setting of TXDCTL.WTHRESH

The q_vector->itr check in ixgbe_configure_tx_ring() was done prior to it
being set, which resulted in TXDCTL.WTHRESH always being set to 1 on driver
load, while consequent resets would set it to 8.

This patch moves the setting of q_vector->itr in ixgbe_alloc_q_vector() to
make sure that TXDCTL.WTHRESH is set to 8 by default.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: fix uninitialized event.type in ixgbe_ptp_check_pps_event
Jacob Keller [Sat, 13 Oct 2012 05:00:06 +0000 (05:00 +0000)]
ixgbe: fix uninitialized event.type in ixgbe_ptp_check_pps_event

This patch fixes a bug in ixgbe_ptp_check_pps_event where the type was
uninitialized and could cause unknown event outcomes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 31 Oct 2012 18:25:33 +0000 (14:25 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe, ixgbevf, igbvf, igb and
networking core (bridge).  Most notably is the addition of support
for local link multicast addresses in SR-IOV mode to the networking
core.

Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and
"ixgbe: Fix return value from macvlan filter function" is revised based
on community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoethernet: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
Joe Perches [Sat, 27 Oct 2012 22:05:48 +0000 (22:05 +0000)]
ethernet: Convert dev_printk(KERN_<LEVEL> to dev_<level>(

dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Coalesce formats for easier grep.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agox86: bpf_jit_comp: add vlan tag support
Eric Dumazet [Sat, 27 Oct 2012 02:26:22 +0000 (02:26 +0000)]
x86: bpf_jit_comp: add vlan tag support

This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ani Sinha <ani@aristanetworks.com>
Cc: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: filter: add vlan tag access
Eric Dumazet [Sat, 27 Oct 2012 02:26:17 +0000 (02:26 +0000)]
net: filter: add vlan tag access

BPF filters lack ability to access skb->vlan_tci

This patch adds two new ancillary accessors :

SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)

SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)

This allows libpcap/tcpdump to use a kernel filter instead of
having to fallback to accept all packets, then filter them in
user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Ani Sinha <ani@aristanetworks.com>
Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/cadence: depend on HAS_IOMEM
Joachim Eastwood [Sat, 27 Oct 2012 02:10:23 +0000 (02:10 +0000)]
net/cadence: depend on HAS_IOMEM

Fixes the following build failure on S390:
  In file included from drivers/net/ethernet/cadence/at91_ether.c:35:0:
   drivers/net/ethernet/cadence/macb.h: In function 'macb_is_gem':
   drivers/net/ethernet/cadence/macb.h:563:2: error: implicit declaration of function '__raw_readl' [-Werror=implicit-function-declaration]
   drivers/net/ethernet/cadence/at91_ether.c: In function 'update_mac_address':
   drivers/net/ethernet/cadence/at91_ether.c:119:2: error: implicit declaration of function '__raw_writel' [-Werror=implicit-function-declaration]
   cc1: some warnings being treated as errors

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonetxen: explicity handle pause autoneg parameter
Flavio Leitner [Fri, 26 Oct 2012 14:39:53 +0000 (14:39 +0000)]
netxen: explicity handle pause autoneg parameter

The hardware doesn't support controlling pause frames autoneg, so
report that back correctly to userspace.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: make tcp_clear_md5_list static
stephen hemminger [Fri, 26 Oct 2012 14:31:40 +0000 (14:31 +0000)]
tcp: make tcp_clear_md5_list static

Trivial. Only used in one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: compute skb->rxhash if nic hash may be 3-tuple
Willem de Bruijn [Fri, 26 Oct 2012 11:52:08 +0000 (11:52 +0000)]
net: compute skb->rxhash if nic hash may be 3-tuple

Network device drivers can communicate a Toeplitz hash in skb->rxhash,
but devices differ in their hashing capabilities. All compute a 5-tuple
hash for TCP over IPv4, but for other connection-oriented protocols,
they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
the kernel computes a 5-tuple hash over the inner packet if possible,
but devices do not.

This patch recomputes the rxhash in software in all cases where it
cannot be certain that a 5-tuple was computed. Device drivers can avoid
recomputation by setting the skb->l4_rxhash flag.

Recomputing adds cycles to each packet when RPS is enabled or the
packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
two servers running unmodified netnext with rxhash computation
in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
how much time is spent in __skb_get_rxhash in this worst case:

     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.05%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash

With 200x TCP_RR it increases to

     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash

I considered having the patch explicitly skips recomputation when it knows
that it will not improve the hash (TCP over IPv4), but that conditional
complicates code without saving many cycles in practice, because it has
to take place after flow dissector.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodlink: dl2k: use the module_pci_driver macro
Devendra Naga [Fri, 26 Oct 2012 09:29:00 +0000 (09:29 +0000)]
dlink: dl2k: use the module_pci_driver macro

use the module_pci_driver macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agorealtek: r8169: use module_pci_driver macro
Devendra Naga [Fri, 26 Oct 2012 09:27:42 +0000 (09:27 +0000)]
realtek: r8169: use module_pci_driver macro

use the module_pci_driver macro to make the code simpler
by eliminating the module_init and module_exit calls

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
David S. Miller [Wed, 31 Oct 2012 17:52:52 +0000 (13:52 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

included changes:
- some code cleanups and minor fixes (3 of them were reported by Coverity)
- 'struct hard_iface' re-shaping to improve multi-protocol support
- ECTP packets silent drop
- transfer the WIFI flag on clients in case of roaming

12 years agoqla3xxx: remove unused variable in ql_process_mac_tx_intr()
Wei Yongjun [Fri, 26 Oct 2012 05:30:31 +0000 (05:30 +0000)]
qla3xxx: remove unused variable in ql_process_mac_tx_intr()

The variable retval is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoqla3xxx: use module_pci_driver to simplify the code
Wei Yongjun [Fri, 26 Oct 2012 05:02:30 +0000 (05:02 +0000)]
qla3xxx: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agosmsc95xx: add wol support for more frame types
Steve Glendinning [Fri, 26 Oct 2012 03:43:56 +0000 (03:43 +0000)]
smsc95xx: add wol support for more frame types

This patch adds support for wol wakeup on unicast, broadcast,
multicast and arp frames.

The wakeup filter code isn't pretty, but it works.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/ipv4/ipconfig: add device address to a KERN_INFO message
Claudio Fontana [Thu, 25 Oct 2012 01:15:43 +0000 (01:15 +0000)]
net/ipv4/ipconfig: add device address to a KERN_INFO message

adds a "hwaddr" to the "IP-Config: Complete" KERN_INFO message
with the dev_addr of the device selected for auto configuration.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: add setlink, getlink support to ixgbe and ixgbevf
John Fastabend [Wed, 24 Oct 2012 08:13:09 +0000 (08:13 +0000)]
ixgbe: add setlink, getlink support to ixgbe and ixgbevf

This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: set and query VEB/VEPA bridge mode via PF_BRIDGE
John Fastabend [Wed, 24 Oct 2012 08:13:03 +0000 (08:13 +0000)]
net: set and query VEB/VEPA bridge mode via PF_BRIDGE

Hardware switches may support enabling and disabling the
loopback switch which puts the device in a VEPA mode defined
in the IEEE 802.1Qbg specification. In this mode frames are
not switched in the hardware but sent directly to the switch.
SR-IOV capable NICs will likely support this mode I am
aware of at least two such devices. Also I am told (but don't
have any of this hardware available) that there are devices
that only support VEPA modes. In these cases it is important
at a minimum to be able to query these attributes.

This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
anticipating bridge attributes that may be common for both embedded
bridges and software bridges this adds a flags attribute
IFLA_BRIDGE_FLAGS currently used to determine if the command or event
is being generated to/from an embedded bridge or software bridge.
Finally, the event generation is pulled out of the bridge module and
into rtnetlink proper.

For example using the macvlan driver in VEPA mode on top of
an embedded switch requires putting the embedded switch into
a VEPA mode to get the expected results.

--------  --------
        | VEPA |  | VEPA |       <-- macvlan vepa edge relays
        --------  --------
           |        |
           |        |
        ------------------
        |      VEPA      |       <-- embedded switch in NIC
        ------------------
                |
                |
        -------------------
        | external switch |      <-- shiny new physical
-------------------          switch with VEPA support

A packet sent from the macvlan VEPA at the top could be
loopbacked on the embedded switch and never seen by the
external switch. So in order for this to work the embedded
switch needs to be set in the VEPA state via the above
described commands.

By making these attributes nested in IFLA_AF_SPEC we allow
future extensions to be made as needed.

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: create generic bridge ops
John Fastabend [Wed, 24 Oct 2012 08:12:57 +0000 (08:12 +0000)]
net: create generic bridge ops

The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

ndo_bridge_setlink()
ndo_bridge_getlink()

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoigb: Fix sparse warning in igb_ptp_rx_pktstamp
Alexander Duyck [Tue, 23 Oct 2012 00:01:04 +0000 (00:01 +0000)]
igb: Fix sparse warning in igb_ptp_rx_pktstamp

This change fixes a sparse warning triggered by us casting the timestamp in
the packet as a u64 instead of as a __le64.  This change corrects that in
order to resolve the sparse warning.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Update firmware version info for ethtool output.
Carolyn Wyborny [Thu, 18 Oct 2012 07:16:19 +0000 (07:16 +0000)]
igb: Update firmware version info for ethtool output.

There are multiple places in our device nvm where the version is stored.
This update fixes some output errors with some types of images and
refactors the way the version data is gathered and stored for ethtool output.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Enable auto-crossover during forced operation on 82580 and above.
Matthew Vick [Tue, 16 Oct 2012 07:44:45 +0000 (07:44 +0000)]
igb: Enable auto-crossover during forced operation on 82580 and above.

Newer devices supported by igb can support auto-crossover detection in
forced operation modes. Enable this in the driver, rather than clobbering
this functionality in forced operation.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigbvf: Check for error on dma_map_single call
Greg Rose [Fri, 21 Sep 2012 00:21:39 +0000 (00:21 +0000)]
igbvf: Check for error on dma_map_single call

Ignoring the return value from a call to the kernel dma_map API functions
can cause data corruption and system instability.  Check the return value
and take appropriate action.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbevf: Do not forward LLDP type frames
Greg Rose [Tue, 2 Oct 2012 00:50:52 +0000 (00:50 +0000)]
ixgbevf: Do not forward LLDP type frames

The driver should not forward LLDP type frames.  Inspect the ether type and
do not send if it is an LLDP ethertype frame.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: reduce PTP rx path overhead
Jiri Benc [Thu, 25 Oct 2012 18:12:05 +0000 (18:12 +0000)]
ixgbe: reduce PTP rx path overhead

Hw timestamping code caused performance regression in ixgbe driver when the
timestamping is not enabled. The culprit is IXGBE_READ_REG call in the Rx
path which is executed for every received skb. This call is not needed when
the timestamping is disabled or for non-ptp packets.

netperf results:

The ixgbe side of the connection was acting as a server, the netperf command
line on the other side was:
netperf -H 192.168.1.23 -T0,0 -t UDP_STREAM -l 20

The values below mean throughput as reported by netperf (local/remote), for
3 runs, with timestamping not enabled.

3.7.0-rc1+ with CONFIG_IXGBE_PTP off:
5373.83 / 3329.32
5721.88 / 3033.89
5653.42 / 3112.38

3.7.0-rc1+ with CONFIG_IXGBE_PTP on:
5233.64 / 1226.85
5448.67 / 1039.32
5421.36 / 1095.66

Patched 3.7.0-rc1+ with CONFIG_IXGBE_PTP on:
5594.72 / 2942.53
5428.95 / 3110.16
5343.56 / 3200.48

Reported-by: Jesper Brouer <jbrouer@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andy Gospodarek <gospo@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: add/update descriptor maps in comments
Josh Hay [Wed, 26 Sep 2012 05:59:36 +0000 (05:59 +0000)]
ixgbe: add/update descriptor maps in comments

Adds/updates ASCII descriptor maps for 82598 and 82599 Tx/Rx descriptors.
Current descriptor maps were out of date for 82598 and incorrect for
82599.

Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: Do not decrement budget in ixgbe_clean_rx_irq
Alexander Duyck [Tue, 25 Sep 2012 00:29:37 +0000 (00:29 +0000)]
ixgbe: Do not decrement budget in ixgbe_clean_rx_irq

This change makes it so that compare the total_rx_packets cleaned to budget
instead of decrementing budget.  The advantage to this approach is that budget
can now be const and we only end up modifying total_rx_packets instead of
modifying both it and budget.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: Return success or failure on VF MAC filter set
Greg Rose [Tue, 25 Sep 2012 02:25:30 +0000 (02:25 +0000)]
ixgbe: Return success or failure on VF MAC filter set

When setting a MAC filter for the VF the function should return a success
or failure code, not the index of the new filter.  It causes spurious NACK
returns to the VF driver.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: clean up the condition for turning on/off the laser
Emil Tantilov [Thu, 20 Sep 2012 03:33:56 +0000 (03:33 +0000)]
ixgbe: clean up the condition for turning on/off the laser

This patch simplifies the check for calling en/disable_tx_laser() function
pointer. The pointer is only set on parts that can use it.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agonet, ixgbe: handle link local multicast addresses in SR-IOV mode
John Fastabend [Tue, 18 Sep 2012 00:01:12 +0000 (00:01 +0000)]
net, ixgbe: handle link local multicast addresses in SR-IOV mode

In SR-IOV mode the PF driver acts as the uplink port and is
used to send control packets e.g. lldpad, stp, etc.

   eth0.1     eth0.2     eth0
   VF         VF         PF
   |          |          |   <-- stand-in for uplink
   |          |          |
  --------------------------
  |  Embedded Switch       |
  --------------------------
              |
             MAC   <-- uplink

But the embedded switch is setup to forward multicast addresses
to all interfaces both VFs and PF and onto the physical link.
This results in reserved MAC addresses used by control protocols
to be forwarded over the switch onto the VF.

In the LLDP case the PF sends an LLDPDU and it is currently
being forwarded to all the VFs who then see the PF as a peer.
This is incorrect.

This patch adds the multicast addresses to the RAR table in the
hardware to prevent this behavior.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: Fix return value from macvlan filter function
Greg Rose [Tue, 30 Oct 2012 00:40:02 +0000 (00:40 +0000)]
ixgbe: Fix return value from macvlan filter function

The function to set the macvlan filter should return success or failure
instead of the index of the filter.  The message processing function was
misinterpreting the index as a non-zero return code indicating failure and
NACKing MAC filter set messages that actually succeeded.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: Add support for pipeline reset
Don Skidmore [Wed, 24 Oct 2012 06:19:01 +0000 (06:19 +0000)]
ixgbe: Add support for pipeline reset

Calling the ixgbe_reset_pipeline_82599 function will ensure a full pipeline
reset on all 82599 devices.  This is necessary to avoid possible link issues.
Since this patch accomplishes this by modifying AUTOC.LMS we need to wrap
all AUTOC writes when LESM is enabled.

v2- fix LMS behaviour based on feedback by Martin Josefsson

CC: Martin Josefsson <gandalf@mjufs.se>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agobatman-adv: add kernel-doc for enum batadv_dbg_level
Antonio Quartulli [Mon, 27 Aug 2012 09:45:37 +0000 (11:45 +0200)]
batman-adv: add kernel-doc for enum batadv_dbg_level

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: pass the WIFI flag from the local to global entry
Antonio Quartulli [Mon, 27 Aug 2012 09:44:43 +0000 (11:44 +0200)]
batman-adv: pass the WIFI flag from the local to global entry

in case of client roaming a new global entry is added while a corresponding
local one is still present. In this case the node can safely pass the WIFI flag
from the local to the global entry.

This change is required to let the AP-isolation correctly working in case of
roaming: if a generic WIFI client C roams from node A to B, A adds a global
entry for C without adding any WIFI flag. The latter will be set only later,
once A has received C's advertisement from B. In this time period the
AP-Isolation (if enabled) would not correctly work since C is not marked as
WIFI, so allowing it to communicate with other WIFI clients.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: properly convert flag into a boolean value
Antonio Quartulli [Mon, 27 Aug 2012 07:35:54 +0000 (09:35 +0200)]
batman-adv: properly convert flag into a boolean value

In order to properly convert a bitwise AND to a boolean value, the whole
expression must be prepended by "!!".

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: check for more space before accessing the skb
Antonio Quartulli [Sun, 26 Aug 2012 21:25:59 +0000 (23:25 +0200)]
batman-adv: check for more space before accessing the skb

In batadv_check_unicast_ttvn() the code accesses both the unicast header and the
Ethernet header in the payload. For this reason pskb_may_pull() must be invoked
to check for the required space.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: print packets re-routing on DBG_TT and ratelimit it
Antonio Quartulli [Sat, 25 Aug 2012 23:05:56 +0000 (01:05 +0200)]
batman-adv: print packets re-routing on DBG_TT and ratelimit it

To simplify TranslationTable debugging it is better to print the packet
rerouting message on the DBG_TT log level. In this way a developer interested in
packets rerouting doesn't need to filter it out of the whole ROUTES log.

Moreover, since this message will appear for each rerouted message, it is now
"ratelimited".

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: properly store the roaming time
Antonio Quartulli [Fri, 24 Aug 2012 15:54:07 +0000 (17:54 +0200)]
batman-adv: properly store the roaming time

in case of a new global entry added because of roaming, the roam_at field must
be properly initiated with the current time. This value will be later use to
purge this entry out on time out (if nobody claims it). Instead roam_at field
is now set to zero in this situation leading to an immediate purging of the
related entry.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: don't allow ECTP traffic on batman-adv
Simon Wunderlich [Sun, 19 Aug 2012 18:10:09 +0000 (20:10 +0200)]
batman-adv: don't allow ECTP traffic on batman-adv

We have seen this to break networks when used with bridge loop
avoidance. As we can't see any benefit from sending these ancient frames
via our mesh, we just drop them.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: Only increase refcounter once for alternate router
Sven Eckelmann [Mon, 20 Aug 2012 08:26:49 +0000 (10:26 +0200)]
batman-adv: Only increase refcounter once for alternate router

The test whether we can use a router for alternating bonding should only be
done once because it is already known that it is still usable and will not be
deleted from the list soon.

This patch addresses Coverity #712285: Unchecked return value

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
12 years agobatman-adv: Check return value of try_module_get
Sven Eckelmann [Mon, 20 Aug 2012 21:37:26 +0000 (23:37 +0200)]
batman-adv: Check return value of try_module_get

New operations should not be started when they need an increased module
reference counter and try_module_get failed.

This patch addresses Coverity #712284: Unchecked return value

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>