Bruce Allan [Fri, 17 Aug 2012 06:18:02 +0000 (06:18 +0000)]
e1000e: cleanup - remove inapplicable comment
Early Receive has been disabled in the driver so this comment is no longer
applicable.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 17 Aug 2012 06:17:57 +0000 (06:17 +0000)]
e1000e: cleanup strict checkpatch check
CHECK: multiple assignments should be avoided
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 17 Aug 2012 06:18:07 +0000 (06:18 +0000)]
e1000e: cleanup strict checkpatch MEMORY_BARRIER checks
Add comments to memory barriers per strict checkpatch.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 17 Aug 2012 06:17:51 +0000 (06:17 +0000)]
e1000e: use correct type for read of 32-bit register
The POEMB register is 32 bits, not 16.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Timur Tabi [Wed, 29 Aug 2012 08:08:03 +0000 (08:08 +0000)]
net/fsl_pq_mdio: add support for the Fman 1G MDIO controller
The MDIO controller on the Frame Manager (Fman) is compatible with the
QE and Gianfar MDIO controllers, but we don't care about the TBI because
the Ethernet drivers (FMD) take care of programming it.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:08:02 +0000 (08:08 +0000)]
net/fsl-pq-mdio: coalesce multiple memory allocations into one
Take advantage of the new mdiobus_alloc_size() function to combine three
different memory allocations into one. This also simplies the error
handling.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:08:01 +0000 (08:08 +0000)]
net/fsl_pq_mdio: streamline probing of MDIO nodes
Make the device tree probe function more data-driven, so that it no longer
searches the 'compatible' property more than once. The of_device_id[] array
allows for per-entry private data, so we use that to store details about each
type of node that the driver supports. This removes the need to check the
'compatible' property inside the probe function.
The driver supports four types on MDIO devices:
1) Gianfar MDIO nodes that only map the MII registers
2) Gianfar MDIO nodes that map the full MDIO register set
3) eTSEC2 MDIO nodes (which map the full MDIO register set)
4) QE MDIO nodes (which map only the MII registers)
Gianfar, eTSEC2, and QE have different mappings for the TBIPA register, which
is needed to initialize the TBI PHY. In addition, the QE needs a special
hack because of the way the device tree is ordered.
All of this information is encapsulated in the fsl_pq_mdio_data structure,
so when an MDIO node is probed, per-device data and functions are used
to determine how to initialize the device.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:08:00 +0000 (08:08 +0000)]
net/fsl_pq_mdio: various small fixes
1) Replace printk with dev_err
2) Fix some whitespace mistakes
3) Rename "ofdev" to "pdev", since it's a platform_device now
4) Fix an inadvertent compound statement by replacing commas with semicolons
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:07:59 +0000 (08:07 +0000)]
net/fsl_pq_mdio: merge some functions together
A few small functions were called only by other functions in the same
file, so merge them together. One function, for example, was calculating
the device address even though the caller was doing the same thing.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:07:58 +0000 (08:07 +0000)]
net/fsl_pq_mdio: trim #include statements
Remove several unnecessary #include statements.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 29 Aug 2012 08:07:57 +0000 (08:07 +0000)]
net/freescale: do not export any functions from fsl_pq_mdio.c
None of the functions in fsl_pq_mdio.c are used by any other source file,
so there's no point in exporting them. Merge the header file into the
source file, make all the functions static, remove any EXPORT_SYMBOL
statements, and delete any #include "fsl_pq_mdio.h" statements.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Tue, 28 Aug 2012 20:37:44 +0000 (20:37 +0000)]
be2net: modify log msg for lack of privilege error
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 28 Aug 2012 20:37:43 +0000 (20:37 +0000)]
be2net: fixup malloc/free of adapter->pmac_id
Free was missing and kcalloc() is better placed in be_ctrl_init()
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Tue, 28 Aug 2012 20:37:42 +0000 (20:37 +0000)]
be2net: fix FW default for VF tx-rate
BE3 FW initializes VF tx-rate to 100Mbps. Fix this to 10Gbps.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Tue, 28 Aug 2012 20:37:41 +0000 (20:37 +0000)]
be2net: fix max VFs reported by HW
BE3 FW allocates VF resources for upto 30 VFs per PF while a max value of 32
may be reported via PCI config space. Fix this in the driver.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 28 Aug 2012 20:37:40 +0000 (20:37 +0000)]
be2net: create RSS rings even in multi-channel configs
Changes from commit df505e were incorrectly over-written by commit
10ef9ab.
Fixing the same.
Change log of the original fix:
Currently RSS rings are not created in a multi-channel config.
RSS rings can be created on one (out of four) interfaces per port in a
multi-channel config. Doing this insulates the driver from a FW bug wherin
multi-channel config is wrongly reported even when not enabled. This also
helps performance in a multi-channel config, as one interface per port gets
RSS rings.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alex.bluesman.smirnov@gmail.com [Sun, 26 Aug 2012 05:10:11 +0000 (05:10 +0000)]
drivers/ieee802154: move ieee802154 drivers to net folder
The IEEE 802.15.4 standard represents a networking protocol. I don't
exactly know why drivers for this protocol are stored into the root
'driver' folder, but better will be to store them with other
networking stuff. Currently there are only 3 drivers available for
IEEE 802.15.4 stack, so lets do it now with the smallest overhead.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alex.bluesman.smirnov@gmail.com [Sun, 26 Aug 2012 05:10:10 +0000 (05:10 +0000)]
drivers/ieee802154/at86rf230: replace the code under _init and _exit by macro
The code under _init and _exit functions is similar to the code of
module_spi_driver macro, which is a wrapper to the module_driver macro,
so use it instead.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Devendra Naga <develkernel412222@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Sat, 25 Aug 2012 22:47:57 +0000 (22:47 +0000)]
netlink: add minlen validation for the new signed types
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 25 Aug 2012 22:18:35 +0000 (22:18 +0000)]
drivers/net/ethernet/tundra/tsi108_eth.c: delete double assignment
Delete successive assignments to the same location.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression i;
@@
*i = ...;
i = ...;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Fri, 24 Aug 2012 17:22:53 +0000 (17:22 +0000)]
forcedeth: prevent TX timeouts after reboot
This complements patch "net-forcedeth: fix TX timeout caused by TX
pause on down link" which ensures that a lock-up sequence is not sent
to the NIC. Present patch ensures that if a NIC is already locked-up,
the driver will recover from it when initializing the device.
It does the equivalent of the following recovery sequence:
- write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register
NvRegTxPauseFrame
- write NVREG_XMITCTL_START to eth1's register
NvRegTransmitterControl
- write 0 to eth1's register NvRegTransmitterControl
(this is at the heart of the "unbricking" sequence mentioned in patch
"net-forcedeth: fix TX timeout caused by TX pause on down link")
Tested:
- hardware is MCP55 device id 10de:0373 (rev a3), dual-port
- reboot a kernel without any of patches mentioned
- freeze the NIC (details on description for commit "net-forcedeth:
fix TX timeout caused by TX pause on down link")
- wait 5mn until ping hangs & TX timeout in dmesg
- reboot on kernel with present patch
- host is immediatly operational, no TX timeout
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Fri, 24 Aug 2012 17:22:52 +0000 (17:22 +0000)]
forcedeth: fix TX timeout caused by TX pause on down link
On some dual-port forcedeth devices such as MCP55 10de:0373 (rev a3),
when autoneg & TX pause are enabled while port is connected but
interface is down, the NIC will eventually freeze (TX timeouts,
network unreachable).
This patch ensures that TX pause is not configured in hardware when
interface is down. The TX pause request will be honored when interface
is later configured.
Tested:
- hardware is MCP55 device id 10de:0373 (rev a3), dual-port
- eth0 connected and UP, eth1 connected but DOWN
- without this patch, following sequence would brick NIC:
ifconfig eth0 down
ifconfig eth1 up
ifconfig eth1 down
ethtool -A eth1 autoneg off rx on tx off
ifconfig eth1 up
ifconfig eth1 down
ethtool -A eth1 autoneg on rx on tx on
ifconfig eth1 up
ifconfig eth1 down
ifup eth0
sleep 120 # or longer
ethtool eth1
Just in case, sequence to un-brick:
ifconfig eth0 down
ethtool -A eth1 autoneg off rx on tx off
ifconfig eth1 up
ifconfig eth1 down
ifup eth0
- with this patch: no TX timeout after "bricking" sequence above
Details:
- The following register accesses have been identified as the ones
causing the NIC to freeze in "bricking" sequence above:
- write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register NvRegTxPauseFrame
- write NVREG_MISC1_PAUSE_TX | NVREG_MISC1_FORCE to eth1's register NvRegMisc1
- write 0 to eth1's register NvRegTransmitterControl
This is what this patch avoids.
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Fri, 24 Aug 2012 17:22:51 +0000 (17:22 +0000)]
forcedeth: fix buffer overflow
Found by manual code inspection.
Tested: compile, reboot, ethtool -d ethX
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Fri, 24 Aug 2012 09:10:53 +0000 (09:10 +0000)]
netdev/phy: add MDIO bus multiplexer driven by a memory-mapped device
Add support for an MDIO bus multiplexer controlled by a simple memory-mapped
device, like an FPGA. The device must be memory-mapped and contain only
8-bit registers (which keeps things simple).
Tested on a Freescale P5020DS board which uses the "PIXIS" FPGA attached
to the localbus.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Kandagatla [Fri, 24 Aug 2012 01:59:17 +0000 (01:59 +0000)]
of/mdio-gpio: Simplify the way device tree support is implemented.
This patch cleans up the way device tree support is added in mdio-gpio
driver. I found lot of code duplication which is not necessary.
Also strangely a new platform driver was also introduced for device tree
support. All this forced me to do this cleanup patch.
After this patch, the driver probe checks the of_node pointer to get the
data from device tree.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Kandagatla [Fri, 24 Aug 2012 01:58:59 +0000 (01:58 +0000)]
of/mdio: Add dummy functions in of_mdio.h.
This patch adds dummy functions in of_mdio.h, so that driver need not
ifdef there code with CONFIG_OF.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 24 Aug 2012 01:47:26 +0000 (01:47 +0000)]
netpoll: provide an IP ident in UDP frames
Let's fill IP header ident field with a meaningful value,
it might help some setups.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 23 Aug 2012 15:36:55 +0000 (15:36 +0000)]
net: dev: fix the incorrect hold of net namespace's lo device
When moving a net device from one net namespace to another
net namespace,dev_change_net_namespace calls NETDEV_DOWN
event,so the original net namespace's dst entries which
beloned to this net device will be put into dst_garbage
list.
then dev_change_net_namespace will set this net device's
net to the new net namespace.
If we unregister this net device's driver, this will trigger
the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
and get this net device's dst entries from dst_garbage list,
put these entries' dev to the new net namespace's lo device.
It's not what we want,actually we need these dst entries hold
the original net namespace's lo device,this incorrect device
holding will trigger emg message like below.
unregister_netdevice: waiting for lo to become free. Usage count = 1
so we should call NETDEV_UNREGISTER_FINAL event in
dev_change_net_namespace too,in order to make sure dst entries
already in the dst_garbage list, we need rcu_barrier before we
call NETDEV_UNREGISTER_FINAL event.
With help form Eric Dumazet.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 22:54:37 +0000 (18:54 -0400)]
Merge branch 'for-next' of git://git./linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 20:35:43 +0000 (16:35 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next
Ben Hutchings says:
====================
1. Change the TX path to stop queues earlier and avoid returning
NETDEV_TX_BUSY.
2. Remove some inefficiencies in soft-TSO.
3. Fix various bugs involving device state transitions and/or reset
scheduling by error handlers.
4. Take advantage of my previous change to operstate initialisation.
5. Miscellaneous cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Aug 2012 19:18:03 +0000 (15:18 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
This is a batch of updates intended for 3.7. The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes. A variety of driver updates
and other bits are scattered in there as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 24 Aug 2012 17:04:38 +0000 (18:04 +0100)]
sfc: Fix the initial device operstate
Following commit
8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 2 Aug 2012 00:39:38 +0000 (01:39 +0100)]
sfc: Assign efx and efx->type as early as possible in efx_pci_probe()
We also stop clearing *efx in efx_init_struct(). This is safe because
alloc_etherdev_mq() already clears it for us.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:57 +0000 (20:50 +0100)]
sfc: Remove bogus comment about MTU change and RX buffer overrun
RX DMA is limited by the length specified in each descriptor and not
by the MAC. Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:54 +0000 (20:50 +0100)]
sfc: Remove overly paranoid locking assertions from netdev operations
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:50:52 +0000 (20:50 +0100)]
sfc: Fix reset vs probe/remove/PM races involving efx_nic::state
We try to defer resets while the device is not READY, but we're not
doing this quite correctly. In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.
1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.
2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again. We need to check the state before scheduling it.
3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.
4. We must cancel the reset work item during device removal, if the
state could ever have been READY. This wasn't done in some of the
failure paths from efx_pci_probe(). Move the cancellation to
efx_pci_remove_main().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:48:36 +0000 (20:48 +0100)]
sfc: Improve log messages in case we abort probe due to a pending reset
The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 19:46:41 +0000 (20:46 +0100)]
sfc: Never try to stop and start a NIC that is disabled
efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled. Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.
Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().
Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.
Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:52 +0000 (19:35 +0100)]
sfc: Hold RTNL lock (only) when calling efx_stop_interrupts()
Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:47 +0000 (19:35 +0100)]
sfc: Keep disabled NICs quiescent during suspend/resume
Currently we ignore and clear the disabled state.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:35:39 +0000 (19:35 +0100)]
sfc: Hold the RTNL lock for more of the suspend/resume cycle
I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 27 Jul 2012 18:31:16 +0000 (19:31 +0100)]
sfc: Change state names to be clearer, and comment them
STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.
Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.
The comments do not quite match current usage, but this will be
corrected in subsequent fixes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Fri, 22 Jun 2012 01:44:01 +0000 (02:44 +0100)]
sfc: Stash header offsets for TSO in struct tso_state
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Tue, 19 Jun 2012 19:03:41 +0000 (20:03 +0100)]
sfc: Replace tso_state::full_packet_space with ip_base_len
We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space. Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 17 May 2012 17:40:54 +0000 (18:40 +0100)]
sfc: Simplify TSO header buffer allocation
TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use. This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).
Replace the free list with a simple mapping by descriptor index. We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.
While we're at it, use a standard error code for allocation failure,
not -1.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Tue, 22 May 2012 00:27:58 +0000 (01:27 +0100)]
sfc: Stop TX queues before they fill up
We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 17 May 2012 19:52:20 +0000 (20:52 +0100)]
sfc: Refactor struct efx_tx_buffer to use a flags field
Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.
Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.
Clear all flags in free buffers (whereas previously the continuation
flag would be set).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Jiri Pirko [Thu, 23 Aug 2012 03:26:53 +0000 (03:26 +0000)]
team: do not allow to add VLAN challenged port when vlan is used
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 23 Aug 2012 03:26:52 +0000 (03:26 +0000)]
vlan: add helper which can be called to see if device is used by vlan
also, remove unused vlan_info definition from header
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 23 Aug 2012 03:26:51 +0000 (03:26 +0000)]
team: don't print warn message on -ESRCH during event send
When no one is listening on NL socket, -ESRCH is returned and warning
message is printed. This message is confusing people and in fact has no
meaning. So do not print it in this case.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 21:28:40 +0000 (21:28 +0000)]
w5300: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 21:28:19 +0000 (21:28 +0000)]
w5100: using eth_hw_addr_random() for random MAC and set device flag
Using eth_hw_addr_random() to generate a random Ethernet address
(MAC) to be used by a net device and set addr_assign_type.
Not need to duplicating its implementation.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 22 Aug 2012 20:49:33 +0000 (20:49 +0000)]
wimax/i2400m: use is_zero_ether_addr() instead of memcmp()
Using is_zero_ether_addr() instead of directly use
memcmp() to determine if the ethernet address is all
zeros.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 20 Aug 2012 21:16:51 +0000 (22:16 +0100)]
net: Set device operstate at registration time
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver. The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.
For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened. However, we must not activate linkwatch for a
unregistered device, and commit
b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't. But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.
The same issue exists with the dormant state.
The proper initialisation sequence, avoiding a race with opening of
the device, is:
rtnl_lock();
rc = register_netdevice(dev);
if (rc)
goto out_unlock;
netif_carrier_off(dev); /* or netif_dormant_on(dev) */
rtnl_unlock();
but it seems silly that this should have to be repeated in so many
drivers. Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.
Commit
22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.
This initialises the operstate synchronously at registration time
only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 20 Aug 2012 09:26:39 +0000 (09:26 +0000)]
net/fsl: introduce Freescale 10G MDIO driver
Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on
Freescale Frame Manager Ethernet controllers.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 20 Aug 2012 07:59:10 +0000 (07:59 +0000)]
cls_cgroup: Allow classifier cgroups to have their classid reset to 0
The network classifier cgroup initalizes each cgroups instance classid value to
0. However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid. The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0. While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).
Easy fix, just remove the zero check. Tested successfully by myself
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Fri, 24 Aug 2012 16:25:30 +0000 (12:25 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem
David S. Miller [Fri, 24 Aug 2012 15:30:38 +0000 (11:30 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rami Rosen [Thu, 23 Aug 2012 02:55:41 +0000 (02:55 +0000)]
packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...
The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
See commit
0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Aug 2012 21:50:59 +0000 (21:50 +0000)]
net: reinstate rtnl in call_netdevice_notifiers()
Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.
This patch is a direct transcription his feedback
against commit
0115e8e30d6fc (net: remove delay at device dismantle)
Thanks Eric !
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Thu, 23 Aug 2012 13:49:42 +0000 (09:49 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211-next
Sven Eckelmann [Sun, 19 Aug 2012 19:48:25 +0000 (21:48 +0200)]
batman-adv: Start new development cycle
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Thu, 5 Jul 2012 21:38:30 +0000 (23:38 +0200)]
batman-adv: change interface_rx to get orig node
In order to understand where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Thu, 5 Jul 2012 21:38:29 +0000 (23:38 +0200)]
batman-adv: detect not yet announced clients
With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.
This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.
This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 16:33:51 +0000 (18:33 +0200)]
batman-adv: Reduce accumulated length of simple statements
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 15:13:15 +0000 (17:13 +0200)]
batman-adv: Don't break statements after assignment operator
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 8 Jul 2012 14:32:09 +0000 (16:32 +0200)]
batman-adv: Use BIT(x) macro to calculate bit positions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Martin Hundebøll [Thu, 5 Jul 2012 09:34:28 +0000 (11:34 +0200)]
batman-adv: Drop tt queries with foreign dest
When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.
This patch adds a check to drop foreign tt queries.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Martin Hundebøll [Thu, 5 Jul 2012 09:34:27 +0000 (11:34 +0200)]
batman-adv: Move batadv_check_unicast_packet()
batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Sven Eckelmann [Sun, 15 Jul 2012 20:26:51 +0000 (22:26 +0200)]
batman-adv: Split batadv_priv in sub-structures for features
The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.
The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sun, 1 Jul 2012 20:51:55 +0000 (22:51 +0200)]
batman-adv: check batadv_orig_hash_add_if() return code
If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.
We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Sun, 1 Jul 2012 17:07:31 +0000 (19:07 +0200)]
batman-adv: fix typos in comments
the word millisecond is misspelled in several comments. This patch fixes it.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Sun, 1 Jul 2012 12:09:12 +0000 (14:09 +0200)]
batman-adv: add reference counting for type batadv_tt_orig_list_entry
The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Jonathan Corbet [Sat, 30 Jun 2012 16:49:13 +0000 (10:49 -0600)]
batman-adv: remove a misleading comment
As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect. So just delete it.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Sat, 23 Jun 2012 09:47:53 +0000 (11:47 +0200)]
batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sat, 23 Jun 2012 10:34:18 +0000 (12:34 +0200)]
batman-adv: rename bridge loop avoidance claim types
for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Sat, 23 Jun 2012 10:34:17 +0000 (12:34 +0200)]
batman-adv: correct comments in bridge loop avoidance
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Simon Wunderlich [Mon, 18 Jun 2012 16:39:26 +0000 (18:39 +0200)]
batman-adv: Add the backbone gateway list to debugfs
This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Tue, 21 Aug 2012 22:42:40 +0000 (00:42 +0200)]
batman-adv: move function arguments on one line
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Pavel Emelyanov [Tue, 21 Aug 2012 01:06:47 +0000 (01:06 +0000)]
packet: Protect packet sk list with mutex (v2)
Change since v1:
* Fixed inuse counters access spotted by Eric
In patch
eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.
[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<
ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<
ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<
ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<
ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(
Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:
* sklist
* prot inuse counters
The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allan, Bruce W [Mon, 20 Aug 2012 04:55:29 +0000 (04:55 +0000)]
mdio: translation of MMD EEE registers to/from ethtool settings
The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).
In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations. The function and some variable names have
been renamed to be more descriptive.
Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any. Has been compile tested and
the translations have been tested on a locally modified version of e1000e.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
danborkmann@iogearbox.net [Mon, 20 Aug 2012 03:34:03 +0000 (03:34 +0000)]
af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.
Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Sun, 19 Aug 2012 21:44:08 +0000 (21:44 +0000)]
rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Eaglesham [Tue, 21 Aug 2012 20:43:35 +0000 (20:43 +0000)]
bonding: support for IPv6 transmit hashing
Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.
The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.
The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.
Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.
Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 19 Aug 2012 03:47:30 +0000 (03:47 +0000)]
ipv6: gre: fix ip6gre_err()
ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.
Also uses pskb_may_pull() to cope with some fragged skbs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 19 Aug 2012 10:31:48 +0000 (12:31 +0200)]
xfrm: fix RCU bugs
This patch reverts commit
56892261ed1a (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit
418a99ac6ad ( Replace rwlock on xfrm_policy_afinfo
with rcu )
1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH
2) We must defer some writes after the synchronize_rcu() call or a reader
can crash dereferencing NULL pointer.
3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()
4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()
5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
and also move xfrm_policy_get_afinfo() declaration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Aug 2012 17:19:46 +0000 (17:19 +0000)]
net: remove delay at device dismantle
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.
These call_rcu() were posted by rt_free(struct rtable *rt) calls.
We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.
As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.
To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.
Use NETDEV_UNREGISTER_FINAL with care !
Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.
With help from Gao feng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Aug 2012 01:48:21 +0000 (18:48 -0700)]
Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:
====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:
* Remove unnecessary RTNL locking now that we have support
for namespace in nf_conntrack, from Patrick McHardy.
* Cleanup to eliminate unnecessary goto in the initialization
path of several Netfilter tables, from Jean Sacren.
* Another cleanup from Wu Fengguang, this time to PTR_RET instead
of if IS_ERR then return PTR_ERR.
* Use list_for_each_entry_continue_rcu in nf_iterate, from
Michael Wang.
* Add pmtu_disc sysctl option to disable PMTU in their tunneling
transmitter, from Julian Anastasov.
* Generalize application protocol registration in IPVS and modify
IPVS FTP helper to use it, from Julian Anastasov.
* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
helper for NAT support, from Julian Anastasov.
* Add logic to update PMTU for IPIP packets in IPVS, again
from Julian Anastasov.
* A couple of sparse warning fixes for IPVS and Netfilter from
Claudiu Ghioc and Patrick McHardy respectively.
Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Aug 2012 21:23:43 +0000 (14:23 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to ethtool.h, e1000, e1000e, and igb to
implement MDI/MDIx control.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Aug 2012 21:21:38 +0000 (14:21 -0700)]
Merge git://git./linux/kernel/git/davem/net
John W. Linville [Wed, 22 Aug 2012 18:15:47 +0000 (14:15 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/iwlwifi/iwlwifi-next
Jean Sacren [Sun, 19 Aug 2012 15:11:32 +0000 (15:11 +0000)]
netfilter: remove unnecessary goto statement for error recovery
Usually it's a good practice to use goto statement for error recovery
when initializing the module. This approach could be an overkill if:
1) there is only one fail case;
2) success and failure use the same return statement.
For a cleaner approach, remove the unnecessary goto statement and
directly implement error recovery.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Michael Wang [Thu, 16 Aug 2012 18:33:39 +0000 (18:33 +0000)]
netfilter: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sujith Manoharan [Wed, 22 Aug 2012 08:51:07 +0000 (14:21 +0530)]
mac80211: Fix AP mode regression
Commit mac80211: avoid using synchronize_rcu in ieee80211_set_probe_resp
changed the return value when the probe response template is not present.
Revert to the earlier value of 1 - this fixes AP mode for drivers like
ath9k.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Wed, 22 Aug 2012 00:22:22 +0000 (17:22 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge fixes from Andrew Morton.
Random drivers and some VM fixes.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
mm: compaction: Abort async compaction if locks are contended or taking too long
mm: have order > 0 compaction start near a pageblock with free pages
rapidio/tsi721: fix unused variable compiler warning
rapidio/tsi721: fix inbound doorbell interrupt handling
drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
mm: correct page->pfmemalloc to fix deactivate_slab regression
drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
mm/compaction.c: fix deferring compaction mistake
drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
string: do not export memweight() to userspace
hugetlb: update hugetlbpage.txt
checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
mm: hugetlbfs: correctly populate shared pmd
cciss: fix incorrect scsi status reporting
Documentation: update mount option in filesystem/vfat.txt
mm: change nr_ptes BUG_ON to WARN_ON
cs5535-clockevt: typo, it's MFGPT, not MFPGT
Linus Torvalds [Tue, 21 Aug 2012 23:54:38 +0000 (16:54 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver,
radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media: soc_camera: don't clear pix->sizeimage in JPEG mode
[media] media: mx2_camera: Fix clock handling for i.MX27
[media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare
[media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare
[media] media: mx3_camera: buf_init() add buffer state check
[media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set
[media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set
[media] radio-shark*: Call cancel_work_sync from disconnect rather then release
[media] radio-shark*: Remove work-around for dangling pointer in usb intfdata
[media] Add USB dependency for IguanaWorks USB IR Transceiver
[media] Add missing logging for rangelow/high of hwseek
[media] VIDIOC_ENUM_FREQ_BANDS fix
[media] mem2mem_testdev: fix querycap regression
[media] si470x: v4l2-compliance fixes
[media] DocBook: Remove a spurious character
[media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer
Linus Torvalds [Tue, 21 Aug 2012 23:46:08 +0000 (16:46 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking update from David Miller:
"A couple weeks of bug fixing in there. The largest chunk is all the
broken crap Amerigo Wang found in the netpoll layer."
1) netpoll and it's users has several serious bugs:
a) uses GFP_KERNEL with locks held
b) interfaces requiring interrupts disabled are called with them
enabled
c) and vice versa
d) VLAN tag demuxing, as per all other RX packet input paths, is not
applied
All from Amerigo Wang.
2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
good, from Neal Cardwell.
3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
when the user doesn't specify one explicitly during sendmsg().
Instead we attach an empty (zero) SCM credential block which is
definitely not what we want. Fix from Eric Dumazet.
4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
from Ben Hutchings.
5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
from Christoph Paasch.
6) When AF_PACKET is used for transmit, packet loopback doesn't behave
properly when a socket fanout is enabled, from Eric Leblond.
7) On bluetooth l2cap channel create failure, we leak the socket, from
Jaganath Kanakkassery.
8) Fix all the netprio file handling bugs found by Al Viro, from John
Fastabend.
9) Several error return and NULL deref bug fixes in networking drivers
from Julia Lawall.
10) A large smattering of struct padding et al. kernel memory leaks to
userspace found of Mathias Krause.
11) Conntrack expections in netfilter can access an uninitialized timer,
fix from Pablo Neira Ayuso.
12) Several netfilter SIP tracker bug fixes from Patrick McHardy.
13) IPSEC ipv6 routes are not initialized correctly all the time,
resulting in an OOPS in inet_putpeer(). Also from Patrick McHardy.
14) Bridging does rcu_dereference() outside of RCU protected area, from
Stephen Hemminger.
15) Fix routing cache removal performance regression when looking up
output routes that have a local destination. From Zheng Yan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
af_netlink: force credentials passing [CVE-2012-3520]
ipv4: fix ip header ident selection in __ip_make_skb()
ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
tcp: fix possible socket refcount problem
net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
net/core/dev.c: fix kernel-doc warning
netconsole: remove a redundant netconsole_target_put()
net: ipv6: fix oops in inet_putpeer()
net/stmmac: fix issue of clk_get for Loongson1B.
caif: Do not dereference NULL in chnl_recv_cb()
af_packet: don't emit packet on orig fanout group
drivers/net/irda: fix error return code
drivers/net/wan/dscc4.c: fix error return code
drivers/net/wimax/i2400m/fw.c: fix error return code
smsc75xx: add missing entry to MAINTAINERS
net: qmi_wwan: new devices: UML290 and K5006-Z
net: sh_eth: Add eth support for R8A7779 device
netdev/phy: skip disabled mdio-mux nodes
dt: introduce for_each_available_child_of_node, of_get_next_available_child
net: netprio: fix cgrp create and write priomap race
...
Mel Gorman [Tue, 21 Aug 2012 23:16:17 +0000 (16:16 -0700)]
mm: compaction: Abort async compaction if locks are contended or taking too long
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks. The workload is straight-forward and in his own words;
The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive. FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.
Early in the test everything looks fine
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
31 15 0 287216 576
38606628 0 0 2 1158 2 14 1 3 95 0 0
27 15 0 225288 576
38583384 0 0 18
2222016 203357 134876 11 56 17 15 0
28 17 0 219256 576
38544736 0 0 11
2305932 203141 146296 11 49 23 17 0
6 18 0 215596 576
38552872 0 0 7
2363207 215264 166502 12 45 22 20 0
22 18 0 226984 576
38596404 0 0 3
2445741 223114 179527 12 43 23 22 0
and then it goes to pot
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
163 8 0 464308 576
36791368 0 0 11 22210 866 536 3 13 79 4 0
207 14 0 917752 576
36181928 0 0 712
1345376 134598 47367 7 90 1 2 0
123 12 0 685516 576
36296148 0 0 429
1386615 158494 60077 8 84 5 3 0
123 12 0 598572 576
36333728 0 0 1107
1233281 147542 62351 7 84 5 4 0
622 7 0 660768 576
36118264 0 0 557
1345548 151394 59353 7 85 4 3 0
223 11 0 283960 576
36463868 0 0 46
1107160 121846 33006 6 93 1 1 0
Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.
commit [
b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.
This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 21 Aug 2012 23:16:15 +0000 (16:16 -0700)]
mm: have order > 0 compaction start near a pageblock with free pages
Commit
7db8889ab05b ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction. However, it has a problem. Consider
two process simultaneously scanning free pages
C
Process A M S F
|---------------------------------------|
Process B M FS
C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn
In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.
Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.
Process A moves to "end_of_zone - one_pageblock" and runs this check
if (cc->order > 0 && (!cc->wrapped ||
zone->compact_cached_free_pfn >
cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);
compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned. When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all. Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.
Overall, the end result wrecks allocation success rates.
There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.
First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.
Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.
If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.
If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.
This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 21 Aug 2012 23:16:12 +0000 (16:16 -0700)]
rapidio/tsi721: fix unused variable compiler warning
Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
option off.
This patch is applicable to kernel versions starting from v3.2
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>