Linus Torvalds [Wed, 20 Nov 2013 22:41:47 +0000 (14:41 -0800)]
Revert "mm: create a separate slab for page->ptl allocation"
This reverts commit
ea1e7ed33708c7a760419ff9ded0a6cb90586a50.
Al points out that while the commit *does* actually create a separate
slab for the page->ptl allocation, that slab is never actually used, and
the code continues to use kmalloc/kfree.
Damien Wyart points out that the original patch did have the conversion
to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.
Revert the half-arsed attempt that didn't do anything. If we really do
want the special slab (remember: this is all relevant just for debug
builds, so it's not necessarily all that critical) we might as well redo
the patch fully.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 20 Nov 2013 22:25:39 +0000 (14:25 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs bits and pieces from Al Viro:
"Assorted bits that got missed in the first pull request + fixes for a
couple of coredump regressions"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fold try_to_ascend() into the sole remaining caller
dcache.c: get rid of pointless macros
take read_seqbegin_or_lock() and friends to seqlock.h
consolidate simple ->d_delete() instances
gfs2: endianness misannotations
dump_emit(): use __kernel_write(), not vfs_write()
dump_align(): fix the dumb braino
Al Viro [Wed, 20 Nov 2013 22:16:36 +0000 (22:16 +0000)]
Wrong page freed on preallocate_pmds() failure exit
Note that pmds[i] is simply uninitialized at that point...
Granted, it's very hard to hit (you need split page locks *and*
kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing), but the code is
obviously bogus.
Introduced by commit
09ef4939850a ("x86: add missed
pgtable_pmd_page_ctor/dtor calls for preallocated pmds")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 20 Nov 2013 21:25:04 +0000 (13:25 -0800)]
Merge tag 'pm+acpi-2-3.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
- ACPI-based device hotplug fixes for issues introduced recently and a
fix for an older error code path bug in the ACPI PCI host bridge
driver
- Fix for recently broken OMAP cpufreq build from Viresh Kumar
- Fix for a recent hibernation regression related to s2disk
- Fix for a locking-related regression in the ACPI EC driver from
Puneet Kumar
- System suspend error code path fix related to runtime PM and runtime
PM documentation update from Ulf Hansson
- cpufreq's conservative governor fix from Xiaoguang Chen
- New processor IDs for intel_idle and turbostat and removal of an
obsolete Kconfig option from Len Brown
- New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg
- Removal of several ACPI video DMI blacklist entries that are not
necessary any more from Aaron Lu
- Rework of the ACPI companion representation in struct device and code
cleanup related to that change from Rafael J Wysocki, Lan Tianyu and
Jarkko Nikula
- Fixes for assigning names to ACPI-enumerated I2C and SPI devices from
Jarkko Nikula
* tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
ACPI / PCI root: Clear driver_data before failing enumeration
ACPI / hotplug: Fix PCI host bridge hot removal
ACPI / hotplug: Fix acpi_bus_get_device() return value check
cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
ACPI / video: clean up DMI table for initial black screen problem
ACPI / EC: Ensure lock is acquired before accessing ec struct members
PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac
spi: Use stable dev_name for ACPI enumerated SPI slaves
i2c: Use stable dev_name for ACPI enumerated I2C slaves
ACPI: Provide acpi_dev_name accessor for struct acpi_device device name
ACPI / bind: Use (put|get)_device() on ACPI device objects too
ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro
ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
PM / Runtime: Fix error path for prepare
PM / Runtime: Update documentation around probe|remove|suspend
cpufreq: conservative: set requested_freq to policy max when it is over policy max
...
Linus Torvalds [Wed, 20 Nov 2013 21:20:24 +0000 (13:20 -0800)]
Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine changes from Vinod Koul:
"This brings for slave dmaengine:
- Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
dmaengine can only transfer and not verify validaty of dma
transfers
- Bunch of fixes across drivers:
- cppi41 driver fixes from Daniel
- 8 channel freescale dma engine support and updated bindings from
Hongbo
- msx-dma fixes and cleanup by Markus
- DMAengine updates from Dan:
- Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
- In the course of testing 1/ a collection of enhancements to
dmatest fell out. Notably basic performance statistics, and
fixed / enhanced test control through new module parameters
'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and
Linus [Walleij] for their review.
- Testing the raid related corner cases of 1/ triggered bugs in
the recently added 16-source operation support in the ioatdma
driver.
- Some minor fixes / cleanups to mv_xor and ioatdma"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
dma: mv_xor: Remove unneeded NULL address check
ioat: fix ioat3_irq_reinit
ioat: kill msix_single_vector support
raid6test: add new corner case for ioatdma driver
ioatdma: clean up sed pool kmem_cache
ioatdma: fix selection of 16 vs 8 source path
ioatdma: fix sed pool selection
ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
dmatest: verbose mode
dmatest: convert to dmaengine_unmap_data
dmatest: add a 'wait' parameter
dmatest: add basic performance metrics
dmatest: add support for skipping verification and random data setup
dmatest: use pseudo random numbers
dmatest: support xor-only, or pq-only channels in tests
dmatest: restore ability to start test at module load and init
dmatest: cleanup redundant "dmatest: " prefixes
dmatest: replace stored results mechanism, with uniform messages
Revert "dmatest: append verify result to results"
...
Linus Torvalds [Wed, 20 Nov 2013 21:06:20 +0000 (13:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
"Normally I'd defer my initial for-linus pull request until after the
merge window, but a race was uncovered in the virtio-blk conversion to
blk-mq that could cause hangs. So here's a small collection of fixes
for you to pull:
- The fix for the virtio-blk IO hang reported by Dave Chinner, from
Shaohua and myself.
- Add the Insert blktrace event for blk-mq. This makes 'btt' happy
when it is doing it's state transition analysis.
- Ensure that blk-mq has disk/partition stats enabled by default,
instead of making it opt-in.
- A fix for __bio_add_page() and large sector counts"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: add blktrace insert event trace
virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations
blk-mq: ensure that we set REQ_IO_STAT so diskstats work
bio: fix argument of __bio_add_page() for max_sectors > 0xffff
Linus Torvalds [Wed, 20 Nov 2013 21:05:25 +0000 (13:05 -0800)]
Merge tag 'md/3.13' of git://neil.brown.name/md
Pull md update from Neil Brown:
"Mostly optimisations and obscure bug fixes.
- raid5 gets less lock contention
- raid1 gets less contention between normal-io and resync-io during
resync"
* tag 'md/3.13' of git://neil.brown.name/md:
md/raid5: Use conf->device_lock protect changing of multi-thread resources.
md/raid5: Before freeing old multi-thread worker, it should flush them.
md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.
UAPI: include <asm/byteorder.h> in linux/raid/md_p.h
raid1: Rewrite the implementation of iobarrier.
raid1: Add some macros to make code clearly.
raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array.
raid1: Add a field array_frozen to indicate whether raid in freeze state.
md: Convert use of typedef ctl_table to struct ctl_table
md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.
md: fix some places where mddev_lock return value is not checked.
raid5: Retry R5_ReadNoMerge flag when hit a read error.
raid5: relieve lock contention in get_active_stripe()
raid5: relieve lock contention in get_active_stripe()
wait: add wait_event_cmd()
md/raid5.c: add proper locking to error path of raid5_start_reshape.
md: fix calculation of stacking limits on level change.
raid5: Use slow_path to release stripe when mddev->thread is null
Jens Axboe [Wed, 20 Nov 2013 01:59:10 +0000 (18:59 -0700)]
blk-mq: add blktrace insert event trace
We need it to make 'btt' from blktrace happy, otherwise
we are missing one state transition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 20 Nov 2013 01:57:24 +0000 (18:57 -0700)]
virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations
It isn't safe to call it without holding the vblk->vq_lock.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Fixed another condition of virtqueue_kick() not holding the lock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mahesh Rajashekhara [Thu, 31 Oct 2013 08:31:02 +0000 (14:01 +0530)]
aacraid: prevent invalid pointer dereference
It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.
It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data. So, we would recommend to add the following
(instead of test for fibsize == 0).
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 19 Nov 2013 23:50:47 +0000 (15:50 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"
1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.
We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.
In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.
2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.
3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.
4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.
5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.
6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.
7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.
8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.
9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.
10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.
11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.
12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.
13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.
14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.
15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.
16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.
17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.
18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.
19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.
20) Fix route leak in VTI tunnel transmit, from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...
Linus Torvalds [Tue, 19 Nov 2013 23:50:03 +0000 (15:50 -0800)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
"Two merge window fallout build fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: merge fix
sparc64: fix build regession
Linus Torvalds [Tue, 19 Nov 2013 23:49:31 +0000 (15:49 -0800)]
Merge tag 'please-pull-fixia64' of git://git./linux/kernel/git/aegl/linux
Pull ia64 fix from Tony Luck:
"Unbreak ia64 build by avoiding circular dependency"
* tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
kernel/bounds: avoid circular dependencies in generated headers
Kirill A. Shutemov [Mon, 18 Nov 2013 08:47:27 +0000 (10:47 +0200)]
kernel/bounds: avoid circular dependencies in generated headers
<linux/spinlock.h> has heavy dependencies on other header files.
It triggers circular dependencies in generated headers on IA64, at
least:
CC kernel/bounds.s
In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from arch/ia64/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:18,
from include/linux/spinlock.h:50,
from kernel/bounds.c:14:
/home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory
compilation terminated.
Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's
enough to find out size of spinlock_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
David S. Miller [Tue, 19 Nov 2013 21:39:42 +0000 (16:39 -0500)]
Merge branch 'genetlink_mcast'
Johannes Berg says:
====================
genetlink: clean up multicast group APIs
The generic netlink multicast group registration doesn't have to
be dynamic, and can thus be simplified just like I did with the
ops. This removes some complexity in registration code.
Additionally, two users of generic netlink already use multicast
groups in a wrong way, add workarounds for those two to keep the
userspace API working, but at the same time make them not clash
with other users of multicast groups as might happen now.
While making it all a bit easier, also prevent such abuse by adding
checks to the APIs so each family can only use the groups it owns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:39 +0000 (15:19 +0100)]
genetlink: make multicast groups const, prevent abuse
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.
This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.
At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:38 +0000 (15:19 +0100)]
genetlink: pass family to functions using groups
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:37 +0000 (15:19 +0100)]
genetlink: add and use genl_set_err()
Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:36 +0000 (15:19 +0100)]
genetlink: remove family pointer from genl_multicast_group
There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:35 +0000 (15:19 +0100)]
genetlink: remove genl_unregister_mc_group()
There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:34 +0000 (15:19 +0100)]
hsr: don't call genl_unregister_mc_group()
There's no need to unregister the multicast group if the
generic netlink family is registered immediately after.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:33 +0000 (15:19 +0100)]
quota/genetlink: use proper genetlink multicast APIs
The quota code is abusing the genetlink API and is using
its family ID as the multicast group ID, which is invalid
and may belong to somebody else (and likely will.)
Make the quota code use the correct API, but since this
is already used as-is by userspace, reserve a family ID
for this code and also reserve that group ID to not break
userspace assumptions.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:32 +0000 (15:19 +0100)]
drop_monitor/genetlink: use proper genetlink multicast APIs
The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)
Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 19 Nov 2013 14:19:31 +0000 (15:19 +0100)]
genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.
The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Tue, 19 Nov 2013 18:10:06 +0000 (22:10 +0400)]
tcp: don't update snd_nxt, when a socket is switched from repair mode
snd_nxt must be updated synchronously with sk_send_head. Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.
Here is a kernel panic from my host.
[ 103.043194] BUG: unable to handle kernel NULL pointer dereference at
0000000000000048
[ 103.044025] IP: [<
ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[ 146.301158] Call Trace:
[ 146.301158] [<
ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0
Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.
In that moment we had a socket where write queue looks like this:
snd_una snd_nxt write_seq
|_________|________|
|
sk_send_head
After a second switching from repair mode the state of socket was
changed:
snd_una snd_nxt, write_seq
|_________ ________|
|
sk_send_head
This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.
Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.
tcp_write_wakeup
skb = tcp_send_head(sk);
tcp_fragment
if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
tcp_adjust_pcount(sk, skb, diff);
tcp_event_new_data_sent
tp->packets_out += tcp_skb_pcount(skb);
I think update of snd_nxt isn't required, when a socket is switched from
repair mode. Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.
I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Tue, 19 Nov 2013 10:09:27 +0000 (18:09 +0800)]
atm: idt77252: fix dev refcnt leak
init_card() calls dev_get_by_name() to get a network deceive. But it
doesn't decrease network device reference count after the device is
used.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fan.du [Tue, 19 Nov 2013 08:53:28 +0000 (16:53 +0800)]
xfrm: Release dst if this dst is improper for vti tunnel
After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.
otherwise causing dst memory leakage.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Tue, 19 Nov 2013 20:18:13 +0000 (21:18 +0100)]
Merge branch 'acpi-hotplug'
* acpi-hotplug:
PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
Mika Westerberg [Tue, 19 Nov 2013 14:15:35 +0000 (16:15 +0200)]
PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
Commit
bd950799d951 (PCI: acpiphp: Convert to dynamic debug) removed users
of acpiphp_debug variable and the variable itself but the declaration was
left in the header file. Drop this unused declaration.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Johannes Berg [Tue, 19 Nov 2013 09:35:40 +0000 (10:35 +0100)]
netlink: fix documentation typo in netlink_set_err()
The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 19 Nov 2013 19:44:15 +0000 (11:44 -0800)]
Merge tag 'arc-v3.13-rc1-part2' of git://git./linux/kernel/git/vgupta/arc
Pull second set of ARC changes from Vineet Gupta:
- Support for Perf from Mischa
- Enabling GPIO/Pinctrl drivers for Abilis TB10x platform
- New defconfig for buildroot
* tag 'arc-v3.13-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-arcfpga] Add defconfig without initramfs location
ARC: perf: ARC 700 PMU doesn't support sampling events
ARC: Add documentation on DT binding for ARC700 PMU
ARC: Add perf support for ARC700 cores
ARC: [TB10x] Updates for GPIO and pinctrl
Linus Torvalds [Tue, 19 Nov 2013 19:43:21 +0000 (11:43 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull second set of s390 patches from Martin Schwidefsky:
"The handling of the PCI hotplug notifications has been improved, the
zfcp dumper can now detect the HSA size dynamically and the default
install kernel has been changed to the compressed bzImage. And two
bug-fixes for scm and 3720"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: implement hotplug notifications
s390/scm_block: do not hide eadm subchannel dependency
s390/sclp: Consolidate early sclp init calls to sclp_early_detect()
s390/sclp: Move early code from sclp_cmd.c to sclp_early.c
s390/sclp: Determine HSA size dynamically for zfcpdump
s390/sclp: Move declarations for sclp_sdias into separate header file
s390/pci: implement pcibios_remove_bus
s390/pci: improve handling of bus resources
s390/3270: fix missing device_destroy() call
s390/boot: Install bzImage as default kernel image
Linus Torvalds [Tue, 19 Nov 2013 19:42:32 +0000 (11:42 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rw/uml
Pull UML changes from Richard Weinberger:
"This pile contains a nice defconfig cleanup, a rewritten stack
unwinder and various cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Remove unused declarations from <as-layout.h>
um: remove used STDIO_CONSOLE Kconfig param
um/vdso: add .gitignore for a couple of targets
arch/um: make it work with defconfig and x86_64
um: Make kstack_depth_to_print conform to arch/x86
um: Get rid of thread_struct->saved_task
um: Make stack trace reliable against kernel mode faults
um: Rewrite show_stack()
Linus Torvalds [Tue, 19 Nov 2013 18:48:19 +0000 (10:48 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"A modular build fix for certain .config's"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Export 'boot_cpu_physical_apicid' to modules
Linus Torvalds [Tue, 19 Nov 2013 18:40:00 +0000 (10:40 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq cleanups from Ingo Molnar:
"This is a multi-arch cleanup series from Thomas Gleixner, which we
kept to near the end of the merge window, to not interfere with
architecture updates.
This series (motivated by the -rt kernel) unifies more aspects of IRQ
handling and generalizes PREEMPT_ACTIVE"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Make PREEMPT_ACTIVE generic
sparc: Use preempt_schedule_irq
ia64: Use preempt_schedule_irq
m32r: Use preempt_schedule_irq
hardirq: Make hardirq bits generic
m68k: Simplify low level interrupt handling code
genirq: Prevent spurious detection for unconditionally polled interrupts
Jens Axboe [Tue, 19 Nov 2013 16:25:07 +0000 (09:25 -0700)]
blk-mq: ensure that we set REQ_IO_STAT so diskstats work
If disk stats are enabled on the queue, a request needs to
be marked with REQ_IO_STAT for accounting to be active on
that request. This fixes an issue with virtio-blk not
showing up in /proc/diskstats after the conversion to
blk-mq.
Add QUEUE_FLAG_MQ_DEFAULT, setting stats and same cpu-group
completion on by default.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
majianpeng [Thu, 14 Nov 2013 04:16:20 +0000 (15:16 +1100)]
md/raid5: Use conf->device_lock protect changing of multi-thread resources.
When we change group_thread_cnt from sysfs entry, it can OOPS.
The kernel messages are:
[ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 135.299073] IP: [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299107] PGD 0
[ 135.299122] Oops: 0000 [#1] SMP
[ 135.299144] Modules linked in: netconsole e1000e ptp pps_core
[ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 135.299255] task:
ffff8800b9638f80 ti:
ffff8800b77a4000 task.ti:
ffff8800b77a4000
[ 135.299283] RIP: 0010:[<
ffffffff815188ab>] [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299323] RSP: 0018:
ffff8800b77a5c48 EFLAGS:
00010002
[ 135.299344] RAX:
ffff880037bb5c70 RBX:
0000000000000000 RCX:
0000000000000008
[ 135.299371] RDX:
ffff880037bb5cb8 RSI:
0000000000000001 RDI:
ffff880037bb5c00
[ 135.299398] RBP:
ffff8800b77a5d08 R08:
0000000000000001 R09:
0000000000000000
[ 135.299425] R10:
ffff8800b77a5c98 R11:
00000000ffffffff R12:
ffff880037bb5c00
[ 135.299452] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff880037bb5c70
[ 135.299479] FS:
0000000000000000(0000) GS:
ffff88013fd80000(0000) knlGS:
0000000000000000
[ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 135.299532] CR2:
0000000000000000 CR3:
0000000001c0b000 CR4:
00000000000407e0
[ 135.299559] Stack:
[ 135.299570]
ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[ 135.299611]
000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[ 135.299654]
000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[ 135.299696] Call Trace:
[ 135.299711] [<
ffffffff8107383e>] ? __wake_up+0x4e/0x70
[ 135.299733] [<
ffffffff81518f88>] raid5d+0x4c8/0x680
[ 135.299756] [<
ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[ 135.299781] [<
ffffffff81524c9f>] md_thread+0x11f/0x170
[ 135.299804] [<
ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[ 135.299826] [<
ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[ 135.299850] [<
ffffffff81069656>] kthread+0xc6/0xd0
[ 135.299871] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299899] [<
ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[ 135.299923] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[ 135.300005] RIP [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.300005] RSP <
ffff8800b77a5c48>
[ 135.300005] CR2:
0000000000000000
[ 135.300005] ---[ end trace
504854e5bb7562ed ]---
[ 135.300005] Kernel panic - not syncing: Fatal exception
This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.
mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.
This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.
Fixes:
b721420e8719131896b009b11edbbd27
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
majianpeng [Thu, 14 Nov 2013 04:16:19 +0000 (15:16 +1100)]
md/raid5: Before freeing old multi-thread worker, it should flush them.
When changing group_thread_cnt from sysfs entry, the kernel can oops.
The kernel messages are:
[ 740.961389] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 740.961444] IP: [<
ffffffff81062570>] process_one_work+0x30/0x500
[ 740.961476] PGD
b9013067 PUD
b651e067 PMD 0
[ 740.961503] Oops: 0000 [#1] SMP
[ 740.961525] Modules linked in: netconsole e1000e ptp pps_core
[ 740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23
[ 740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 740.961646] task:
ffff88013abe0000 ti:
ffff88013a246000 task.ti:
ffff88013a246000
[ 740.961673] RIP: 0010:[<
ffffffff81062570>] [<
ffffffff81062570>] process_one_work+0x30/0x500
[ 740.961708] RSP: 0018:
ffff88013a247e08 EFLAGS:
00010086
[ 740.961730] RAX:
ffff8800b912b400 RBX:
ffff88013a61e680 RCX:
ffff8800b912b400
[ 740.961757] RDX:
ffff8800b912b600 RSI:
ffff8800b912b600 RDI:
ffff88013a61e680
[ 740.961782] RBP:
ffff88013a247e48 R08:
ffff88013a246000 R09:
000000000002c09d
[ 740.961808] R10:
000000000000010f R11:
0000000000000000 R12:
ffff88013b00cc00
[ 740.961833] R13:
0000000000000000 R14:
ffff88013b00cf80 R15:
ffff88013a61e6b0
[ 740.961861] FS:
0000000000000000(0000) GS:
ffff88013fc00000(0000) knlGS:
0000000000000000
[ 740.961893] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 740.962001] CR2:
00000000000000b8 CR3:
00000000b24fe000 CR4:
00000000000407f0
[ 740.962001] Stack:
[ 740.962001]
0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680
[ 740.962001]
ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0
[ 740.962001]
ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8
[ 740.962001] Call Trace:
[ 740.962001] [<
ffffffff810639c6>] worker_thread+0x206/0x3f0
[ 740.962001] [<
ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0
[ 740.962001] [<
ffffffff81069656>] kthread+0xc6/0xd0
[ 740.962001] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 740.962001] [<
ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[ 740.962001] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84
[ 740.962001] RIP [<
ffffffff81062570>] process_one_work+0x30/0x500
[ 740.962001] RSP <
ffff88013a247e08>
[ 740.962001] CR2:
0000000000000008
[ 740.962001] ---[ end trace
39181460000748de ]---
[ 740.962001] Kernel panic - not syncing: Fatal exception
This can happen if there are some stripes left, fewer than MAX_STRIPE_BATCH.
A worker is queued to handle them.
But before calling raid5_do_work, raid5d handles those
stripes making conf->active_stripe = 0.
So mddev_suspend() can return.
We might then free old worker resources before the queued
raid5_do_work() handled them. When it runs, it crashes.
raid5d() raid5_store_group_thread_cnt()
queue_work mddev_suspend()
handle_strips
active_stripe=0
free(old worker resources)
process_one_work
raid5_do_work
To avoid this, we should only flush the worker resources before freeing them.
This fixes a bug introduced in 3.12 so is suitable for the 3.12.x
stable series.
Cc: stable@vger.kernel.org (3.12)
Fixes:
b721420e8719131896b009b11edbbd27
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
majianpeng [Thu, 14 Nov 2013 04:16:19 +0000 (15:16 +1100)]
md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.
For R5_ReadNoMerge,it mean this bio can't merge with other bios or
request.It used REQ_FLUSH to achieve this. But REQ_NOMERGE can do the
same work.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Aurelien Jarno [Thu, 14 Nov 2013 04:16:19 +0000 (15:16 +1100)]
UAPI: include <asm/byteorder.h> in linux/raid/md_p.h
linux/raid/md_p.h is using conditionals depending on endianess and fails
with an error if neither of __BIG_ENDIAN, __LITTLE_ENDIAN or
__BYTE_ORDER are defined, but it doesn't include any header which can
define these constants. This make this header unusable alone.
This patch adds a #include <asm/byteorder.h> at the beginning of this
header to make it usable alone. This is needed to compile klibc on MIPS.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Fri, 15 Nov 2013 06:55:02 +0000 (14:55 +0800)]
raid1: Rewrite the implementation of iobarrier.
There is an iobarrier in raid1 because of contention between normal IO and
resync IO. It suspends all normal IO when resync/recovery happens.
However if normal IO is out side the resync window, there is no contention.
So this patch changes the barrier mechanism to only block IO that
could contend with the resync that is currently happening.
We partition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
start next_resync start_next_window end_window
start + RESYNC_WINDOW = next_resync
next_resync + NEXT_NORMALIO_DISTANCE = start_next_window
start_next_window + NEXT_NORMALIO_DISTANCE = end_window
Firstly we introduce some concepts:
1 - RESYNC_WINDOW: For resync, there are 32 resync requests at most at the
same time. A sync request is RESYNC_BLOCK_SIZE(64*1024).
So the RESYNC_WINDOW is 32 * RESYNC_BLOCK_SIZE, that is 2MB.
2 - NEXT_NORMALIO_DISTANCE: the distance between next_resync
and start_next_window. It also indicates the distance between
start_next_window and end_window.
It is currently 3 * RESYNC_WINDOW_SIZE but could be tuned if
this turned out not to be optimal.
3 - next_resync: the next sector at which we will do sync IO.
4 - start: a position which is at most RESYNC_WINDOW before
next_resync.
5 - start_next_window: a position which is NEXT_NORMALIO_DISTANCE
beyond next_resync. Normal-io after this position doesn't need to
wait for resync-io to complete.
6 - end_window: a position which is 2 * NEXT_NORMALIO_DISTANCE beyond
next_resync. This also doesn't need to wait, but is counted
differently.
7 - current_window_requests: the count of normalIO between
start_next_window and end_window.
8 - next_window_requests: the count of normalIO after end_window.
NormalIO will be partitioned into four types:
NormIO1: the end sector of bio is smaller or equal the start
NormIO2: the start sector of bio larger or equal to end_window
NormIO3: the start sector of bio larger or equal to
start_next_window.
NormIO4: the location between start_next_window and end_window
|--------|-----------|--------------------|----------------|-------------|
| start | next_resync | start_next_window | end_window |
NormIO1 NormIO4 NormIO4 NormIO3 NormIO2
For NormIO1, we don't need any io barrier.
For NormIO4, we used a similar approach to the original iobarrier
mechanism. The normalIO and resyncIO must be kept separate.
For NormIO2/3, we add two fields to struct r1conf: "current_window_requests"
and "next_window_requests". They indicate the count of active
requests in the two window.
For these, we don't wait for resync io to complete.
For resync action, if there are NormIO4s, we must wait for it.
If not, we can proceed.
But if resync action reaches start_next_window and
current_window_requests > 0 (that is there are NormIO3s), we must
wait until the current_window_requests becomes zero.
When current_window_requests becomes zero, start_next_window also
moves forward. Then current_window_requests will replaced by
next_window_requests.
There is a problem which when and how to change from NormIO2 to
NormIO3. Only then can sync action progress.
We add a field in struct r1conf "start_next_window".
A: if start_next_window == MaxSector, it means there are no NormIO2/3.
So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
means start_next_window move to end_window
There is another problem which how to differentiate between
old NormIO2(now it is NormIO3) and NormIO2.
For example, there are many bios which are NormIO2 and a bio which is
NormIO3. NormIO3 firstly completed, so the bios of NormIO2 became NormIO3.
We add a field in struct r1bio "start_next_window".
This is used to record the position conf->start_next_window when the call
to wait_barrier() is made in make_request().
In allow_barrier(), we check the conf->start_next_window.
If r1bio->stat_next_window == conf->start_next_window, it means
there is no transition between NormIO2 and NormIO3.
If r1bio->start_next_window != conf->start_next_window, it mean
there was a transition between NormIO2 and NormIO3. There can only
have been one transition. So it only means the bio is old NormIO2.
For one bio, there may be many r1bio's. So we make sure
all the r1bio->start_next_window are the same value.
If we met blocked_dev in make_request(), it must call allow_barrier
and wait_barrier. So the former and the later value of
conf->start_next_window will be change.
If there are many r1bio's with differnet start_next_window,
for the relevant bio, it depend on the last value of r1bio.
It will cause error. To avoid this, we must wait for previous r1bios
to complete.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Thu, 14 Nov 2013 04:16:18 +0000 (15:16 +1100)]
raid1: Add some macros to make code clearly.
In a subsequent patch, we'll use some const parameters.
Using macros will make the code clearly.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Thu, 14 Nov 2013 04:16:18 +0000 (15:16 +1100)]
raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array.
We used to use raise_barrier to suspend normal IO while we reconfigure
the array. However raise_barrier will soon only suspend some normal
IO, not all. So we need something else.
Change it to use freeze_array.
But freeze_array not only suspends normal io, it also suspends
resync io.
For the place where call raise_barrier for reconfigure, it isn't a
problem.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Thu, 14 Nov 2013 04:16:18 +0000 (15:16 +1100)]
raid1: Add a field array_frozen to indicate whether raid in freeze state.
Because the following patch will rewrite the content between normal IO
and resync IO. So we used a parameter to indicate whether raid is in freeze
array.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Joe Perches [Thu, 14 Nov 2013 04:16:18 +0000 (15:16 +1100)]
md: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 14 Nov 2013 04:16:17 +0000 (15:16 +1100)]
md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
When raid5 recovery hits a fresh badblock, this badblock will flagged as unack
badblock until md_update_sb() is called.
But md_stop will take reconfig lock which means raid5d can't call
md_update_sb() in md_check_recovery(), the badblock will always
be unack, so raid5d thread enters an infinite loop and md_stop_write()
can never stop sync_thread. This causes deadlock.
To solve this, when STOP_ARRAY ioctl is issued and sync_thread is
running, we need set md->recovery FROZEN and INTR flags and wait for
sync_thread to stop before we (re)take reconfig lock.
This requires that raid5 reshape_request notices MD_RECOVERY_INTR
(which it probably should have noticed anyway) and stops waiting for a
metadata update in that case.
Reported-by: Jianpeng Ma <majianpeng@gmail.com>
Reported-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 19 Nov 2013 01:02:01 +0000 (12:02 +1100)]
md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.
We currently use kthread_should_stop() in various places in the
sync/reshape code to abort early.
However some places set MD_RECOVERY_INTR but don't immediately call
md_reap_sync_thread() (and we will shortly get another one).
When this happens we are relying on md_check_recovery() to reap the
thread and that only happen when it finishes normally.
So MD_RECOVERY_INTR must lead to a normal finish without the
kthread_should_stop() test.
So replace all relevant tests, and be more careful when the thread is
interrupted not to acknowledge that latest step in a reshape as it may
not be fully committed yet.
Also add a test on MD_RECOVERY_INTR in the 'is_mddev_idle' loop
so we don't wait have to wait for the speed to drop before we can abort.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 14 Nov 2013 06:54:51 +0000 (17:54 +1100)]
md: fix some places where mddev_lock return value is not checked.
Sometimes we need to lock and mddev and cannot cope with
failure due to interrupt.
In these cases we should use mutex_lock, not mutex_lock_interruptible.
Signed-off-by: NeilBrown <neilb@suse.de>
Bian Yu [Thu, 14 Nov 2013 04:16:17 +0000 (15:16 +1100)]
raid5: Retry R5_ReadNoMerge flag when hit a read error.
Because of block layer merge, one bio fails will cause other bios
which belongs to the same request fails, so raid5_end_read_request
will record all these bios as badblocks.
If retry request with R5_ReadNoMerge flag to avoid bios merge,
badblocks can only record sector which is bad exactly.
test:
hdparm --yes-i-know-what-i-am-doing --make-bad-sector 300000 /dev/sdb
mdadm -C /dev/md0 -l5 -n3 /dev/sd[bcd] --assume-clean
mdadm /dev/md0 -f /dev/sdd
mdadm /dev/md0 -r /dev/sdd
mdadm --zero-superblock /dev/sdd
mdadm /dev/md0 -a /dev/sdd
1. Without this patch:
cat /sys/block/md0/md/rd*/bad_blocks
299776 256
299776 256
2. With this patch:
cat /sys/block/md0/md/rd*/bad_blocks
300000 8
300000 8
Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Shaohua Li [Thu, 14 Nov 2013 04:16:17 +0000 (15:16 +1100)]
raid5: relieve lock contention in get_active_stripe()
track empty inactive list count, so md_raid5_congested() can use it to make
decision.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Al Viro [Tue, 19 Nov 2013 01:20:43 +0000 (01:20 +0000)]
seq_file: always clear m->count when we free m->buf
Once we'd freed m->buf, m->count should become zero - we have no valid
contents reachable via m->buf.
Reported-by: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Tue, 19 Nov 2013 00:07:08 +0000 (01:07 +0100)]
Merge branch 'pm-sleep'
* pm-sleep:
PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
Rafael J. Wysocki [Tue, 19 Nov 2013 00:07:00 +0000 (01:07 +0100)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
cpufreq: conservative: set requested_freq to policy max when it is over policy max
Rafael J. Wysocki [Tue, 19 Nov 2013 00:06:49 +0000 (01:06 +0100)]
Merge branch 'pm-runtime'
* pm-runtime:
PM / Runtime: Fix error path for prepare
PM / Runtime: Update documentation around probe|remove|suspend
Rafael J. Wysocki [Tue, 19 Nov 2013 00:06:38 +0000 (01:06 +0100)]
Merge branch 'pm-tools'
* pm-tools:
tools / power turbostat: Support Silvermont
Rafael J. Wysocki [Tue, 19 Nov 2013 00:06:28 +0000 (01:06 +0100)]
Merge branch 'pm-cpuidle'
* pm-cpuidle:
intel_idle: Support Intel Atom Processor C2000 Product Family
Rafael J. Wysocki [Tue, 19 Nov 2013 00:06:17 +0000 (01:06 +0100)]
Merge branch 'acpi-video'
* acpi-video:
ACPI / video: clean up DMI table for initial black screen problem
Rafael J. Wysocki [Tue, 19 Nov 2013 00:06:06 +0000 (01:06 +0100)]
Merge branch 'acpi-ec'
* acpi-ec:
ACPI / EC: Ensure lock is acquired before accessing ec struct members
Rafael J. Wysocki [Tue, 19 Nov 2013 00:05:46 +0000 (01:05 +0100)]
Merge branch 'acpi-hotplug'
* acpi-hotplug:
ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
ACPI / PCI root: Clear driver_data before failing enumeration
ACPI / hotplug: Fix PCI host bridge hot removal
ACPI / hotplug: Fix acpi_bus_get_device() return value check
Rafael J. Wysocki [Mon, 18 Nov 2013 13:18:47 +0000 (14:18 +0100)]
ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
Before commit
6931007cc90b (ACPI / scan: Start matching drivers
after trying scan handlers) the match_driver flag for all devices
was set in acpi_add_single_object(), but now it is set by
acpi_bus_device_attach() which is not called for the "fixed"
devices added by acpi_bus_scan_fixed(). This means that
flags.match_driver is never set for those devices now, so make
acpi_bus_scan_fixed() set it before calling device_attach().
Fixes:
6931007cc90b (ACPI / scan: Start matching drivers after trying scan handlers)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 13 Nov 2013 23:54:17 +0000 (00:54 +0100)]
ACPI / PCI root: Clear driver_data before failing enumeration
If a PCI host bridge cannot be enumerated due to an error in
pci_acpi_scan_root(), its ACPI device object's driver_data field
has to be cleared by acpi_pci_root_add() before freeing the
object pointed to by that field, or some later acpi_pci_find_root()
checks that should fail may succeed and cause quite a bit of
confusion to ensue.
Fix acpi_pci_root_add() to clear device->driver_data before
returning an error code as appropriate.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Rafael J. Wysocki [Wed, 13 Nov 2013 23:54:08 +0000 (00:54 +0100)]
ACPI / hotplug: Fix PCI host bridge hot removal
Since the PCI host bridge scan handler does not set hotplug.enabled,
the check of it in acpi_bus_device_eject() effectively prevents the
root bridge hot removal from working after commit
a3b1b1ef78cd
(ACPI / hotplug: Merge device hot-removal routines). However, that
check is not necessary, because the other acpi_bus_device_eject()
users, acpi_hotplug_notify_cb and acpi_eject_store(), do the same
check by themselves before executing that function.
For this reason, remove the scan handler check from
acpi_bus_device_eject() to make PCI hot bridge hot removal work
again.
Fixes:
a3b1b1ef78cd (ACPI / hotplug: Merge device hot-removal routines)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Rafael J. Wysocki [Wed, 13 Nov 2013 23:54:00 +0000 (00:54 +0100)]
ACPI / hotplug: Fix acpi_bus_get_device() return value check
Since acpi_bus_get_device() returns a plain int and not acpi_status,
ACPI_FAILURE() should not be used for checking its return value. Fix
that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Linus Torvalds [Mon, 18 Nov 2013 23:56:13 +0000 (15:56 -0800)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog changes from Wim Van Sebroeck:
- addition of MOXA ART watchdog driver (moxart_wdt)
- addition of CSR SiRFprimaII and SiRFatlasVI watchdog driver
(sirfsoc_wdt)
- addition of ralink watchdog driver (rt2880_wdt)
- various fixes and cleanups (__user annotation, ioctl return codes,
removal of redundant of_match_ptr, removal of unnecessary
amba_set_drvdata(), use allocated buffer for usb_control_msg, ...)
- removal of MODULE_ALIAS_MISCDEV statements
- watchdog related DT bindings
- first set of improvements on the w83627hf_wdt driver
* git://www.linux-watchdog.org/linux-watchdog: (26 commits)
watchdog: w83627hf: Use helper functions to access superio registers
watchdog: w83627hf: Enable watchdog device only if not already enabled
watchdog: w83627hf: Enable watchdog only once
watchdog: w83627hf: Convert to watchdog infrastructure
watchdog: omap_wdt: raw read and write endian fix
watchdog: sirf: don't depend on dummy value of CLOCK_TICK_RATE
watchdog: pcwd_usb: overflow in usb_pcwd_send_command()
watchdog: rt2880_wdt: fix return value check in rt288x_wdt_probe()
watchdog: watchdog_core: Fix a trivial typo
watchdog: dw: Enable OF support for DW watchdog timer
watchdog: Get rid of MODULE_ALIAS_MISCDEV statements
watchdog: ts72xx_wdt: Propagate return value from timeout_to_regval
watchdog: pcwd_usb: Use allocated buffer for usb_control_msg
watchdog: sp805_wdt: Remove unnecessary amba_set_drvdata()
watchdog: sirf: add watchdog driver of CSR SiRFprimaII and SiRFatlasVI
watchdog: Remove redundant of_match_ptr
watchdog: ts72xx_wdt: cleanup return codes in ioctl
documentation/devicetree: Move DT bindings from gpio to watchdog
watchdog: add ralink watchdog driver
watchdog: Add MOXA ART watchdog driver
...
Linus Torvalds [Mon, 18 Nov 2013 23:50:07 +0000 (15:50 -0800)]
Merge branch 'i2c/for-next' of git://git./linux/kernel/git/wsa/linux
Pull i2c changes from Wolfram Sang:
- new drivers for exynos5, bcm kona, and st micro
- bigger overhauls for drivers mxs and rcar
- typical driver bugfixes, cleanups, improvements
- got rid of the superfluous 'driver' member in i2c_client struct This
touches a few drivers in other subsystems. All acked.
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
i2c: bcm-kona: fix error return code in bcm_kona_i2c_probe()
i2c: i2c-eg20t: do not print error message in syslog if no ACK received
i2c: bcm-kona: Introduce Broadcom I2C Driver
i2c: cbus-gpio: Fix device tree binding
i2c: wmt: add missing clk_disable_unprepare() on error
i2c: designware: add new ACPI IDs
i2c: i801: Add Device IDs for Intel Wildcat Point-LP PCH
i2c: exynos5: Remove incorrect clk_disable_unprepare
i2c: i2c-st: Add ST I2C controller
i2c: exynos5: add High Speed I2C controller driver
i2c: rcar: fixup rcar type naming
i2c: scmi: remove some bogus NULL checks
i2c: sh_mobile & rcar: Enable the driver on all ARM platforms
i2c: sh_mobile: Convert to clk_prepare/unprepare
i2c: mux: gpio: use reg value for i2c_add_mux_adapter
i2c: mux: gpio: use gpio_set_value_cansleep()
i2c: Include linux/of.h header
i2c: mxs: Fix PIO mode on i.MX23
i2c: mxs: Rework the PIO mode operation
i2c: mxs: distinguish i.MX23 and i.MX28 based I2C controller
...
Linus Torvalds [Mon, 18 Nov 2013 23:36:04 +0000 (15:36 -0800)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
IB/core: Re-enable create_flow/destroy_flow uverbs
IB/core: extended command: an improved infrastructure for uverbs commands
IB/core: Remove ib_uverbs_flow_spec structure from userspace
IB/core: Use a common header for uverbs flow_specs
IB/core: Make uverbs flow structure use names like verbs ones
IB/core: Rename 'flow' structs to match other uverbs structs
IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
IB/ucma: Convert use of typedef ctl_table to struct ctl_table
IB/cm: Convert to using idr_alloc_cyclic()
IB/mlx5: Fix page shift in create CQ for userspace
IB/mlx4: Fix device max capabilities check
IB/mlx5: Fix list_del of empty list
IB/mlx5: Remove dead code
IB/core: Encorce MR access rights rules on kernel consumers
IB/mlx4: Fix endless loop in resize CQ
RDMA/cma: Remove unused argument and minor dead code
RDMA/ucma: Discard events for IDs not yet claimed by user space
IB/core: Add Cisco usNIC rdma node and transport types
RDMA/nes: Remove self-assignment from nes_query_qp()
IB/srp: Report receive errors correctly
...
Linus Torvalds [Mon, 18 Nov 2013 23:35:09 +0000 (15:35 -0800)]
Merge tag 'for-v3.13' of git://git.infradead.org/battery-2.6
Pull battery updates from Anton Vorontsov:
"Highlights:
- A new driver for TI BQ24735 Battery Chargers, courtesy of NVidia.
- Device tree bindings for TWL4030 chips.
- Random fixes and cleanups"
* tag 'for-v3.13' of git://git.infradead.org/battery-2.6:
pm2301-charger: Remove unneeded NULL checks
twl4030_charger: Add devicetree support
power_supply: Fix documentation for TEMP_*ALERT* properties
max17042_battery: Support regmap to access device's registers
max17042_battery: Use SIMPLE_DEV_PM_OPS
charger-manager : Replace kzalloc to devm_kzalloc and remove uneccessary code
bq2415x_charger: Fix max battery regulation voltage
tps65090-charger: Use "IS_ENABLED(CONFIG_OF)" for DT code
tps65090-charger: Drop devm_free_irq of devm_ allocated irq
power_supply: Add support for bq24735 charger
pm2301-charger: Staticize pm2xxx_charger_die_therm_mngt
pm2301-charger: Check return value of regulator_enable
ab8500-charger: Remove redundant break
ab8500-charger: Check return value of regulator_enable
isp1704_charger: Fix driver to work with changes introduced in v3.5
Linus Torvalds [Mon, 18 Nov 2013 23:10:05 +0000 (15:10 -0800)]
Merge branch 'topic/kbuild-fixes-for-next' of git://git./linux/kernel/git/mchehab/linux-media
Pull media build fixes from Mauro Carvalho Chehab:
"A series of patches that fix compilation on non-x86 archs.
While most of them are just build fixes, there are some fixes for real
bugs, as there are a number of drivers using dynamic stack allocation.
A few of those might be considered a security risk, if the i2c-dev
module is loaded, as someone could be sending very long I2C data that
could potentially overflow the Kernel stack. Ok, as using /dev/i2c-*
devnodes usually requires root on usual distros, and exploiting it
would require a DVB board or USB stick, the risk is not high"
* 'topic/kbuild-fixes-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (28 commits)
[media] platform drivers: Fix build on frv arch
[media] lirc_zilog: Don't use dynamic static allocation
[media] mxl111sf: Don't use dynamic static allocation
[media] af9035: Don't use dynamic static allocation
[media] af9015: Don't use dynamic static allocation
[media] dw2102: Don't use dynamic static allocation
[media] dibusb-common: Don't use dynamic static allocation
[media] cxusb: Don't use dynamic static allocation
[media] v4l2-async: Don't use dynamic static allocation
[media] cimax2: Don't use dynamic static allocation
[media] tuner-xc2028: Don't use dynamic static allocation
[media] tuners: Don't use dynamic static allocation
[media] av7110_hw: Don't use dynamic static allocation
[media] stv090x: Don't use dynamic static allocation
[media] stv0367: Don't use dynamic static allocation
[media] stb0899_drv: Don't use dynamic static allocation
[media] dvb-frontends: Don't use dynamic static allocation
[media] dvb-frontends: Don't use dynamic static allocation
[media] s5h1420: Don't use dynamic static allocation
[media] uvc/lirc_serial: Fix some warnings on parisc arch
...
Stephen Rothwell [Mon, 18 Nov 2013 03:19:33 +0000 (14:19 +1100)]
sparc64: merge fix
After merging the final tree, today's linux-next build (sparc64 defconfig)
failed like this:
arch/sparc/mm/init_64.c: In function 'pte_alloc_one':
arch/sparc/mm/init_64.c:2568:9: error: unused variable 'pte' [-Werror=unused-variable]
Caused by the merge between commit
37b3a8ff3e08 ("sparc64: Move from 4MB
to 8MB huge pages") and commit
1ae9ae5f7df7 ("sparc: handle
pgtable_page_ctor() fail") (I had the following merge fix in linux-next,
but it didn't seem to propagate upstream - may have forgotten to point it
out :-().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 18 Nov 2013 23:08:02 +0000 (15:08 -0800)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
"This series include:
- a new Remote Controller driver for ST SoC with the corresponding DT
bindings
- a new frontend (cx24117)
- a new I2C camera flash driver (lm3560)
- a new mem2mem driver for TI SoC (ti-vpe)
- support for Raphael r828d added to r820t driver
- some improvements on buffer allocation at VB2 core
- usual driver fixes and improvements
PS this time, we have a smaller number of patches. While it is hard
to pinpoint to the reasons, I believe that it is mainly due to:
1) there are several patch series ready, but depending on DT review.
I decided to grant some extra time for DT maintainers to look on
it, as they're expecting to have more time with the changes agreed
during ARM mini-summit and KS. If they can't review in time for
3.14, I'll review myself and apply for the next merge window.
2) I suspect that having both LinuxCon EU and LinuxCon NA happening
during the same merge window affected the development
productivity, as several core media developers participated on
both events"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (151 commits)
[media] media: st-rc: Add ST remote control driver
[media] gpio-ir-recv: Include linux/of.h header
[media] tvp7002: Include linux/of.h header
[media] tvp514x: Include linux/of.h header
[media] ths8200: Include linux/of.h header
[media] adv7343: Include linux/of.h header
[media] v4l: Fix typo in v4l2_subdev_get_try_crop()
[media] media: i2c: add driver for dual LED Flash, lm3560
[media] rtl28xxu: add 15f4:0131 Astrometa DVB-T2
[media] rtl28xxu: add RTL2832P + R828D support
[media] rtl2832: add new tuner R828D
[media] r820t: add support for R828D
[media] media/i2c: ths8200: fix build failure with gcc 4.5.4
[media] Add support for KWorld UB435-Q V2
[media] staging/media: fix msi3101 build errors
[media] ddbridge: Remove casting the return value which is a void pointer
[media] ngene: Remove casting the return value which is a void pointer
[media] dm1105: remove unneeded not-null test
[media] sh_mobile_ceu_camera: remove deprecated IRQF_DISABLED
[media] media: rcar_vin: Add preliminary r8a7790 support
...
Kirill A. Shutemov [Mon, 18 Nov 2013 09:44:09 +0000 (11:44 +0200)]
sparc64: fix build regession
Commit
ea1e7ed33708 triggers build regression on sparc64.
include/linux/mm.h:1391:2: error: implicit declaration of function 'pgtable_cache_init' [-Werror=implicit-function-declaration]
arch/sparc/include/asm/pgtable_64.h:978:13: error: conflicting types for 'pgtable_cache_init' [-Werror]
It happens due headers include loop:
<linux/mm.h> -> <asm/pgtable.h> -> <asm/pgtable_64.h> ->
<asm/tlbflush.h> -> <asm/tlbflush_64.h> -> <linux/mm.h>
Let's drop <linux/mm.h> include from asm/tlbflush_64.h.
Build tested with allmodconfig.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 18 Nov 2013 22:51:52 +0000 (14:51 -0800)]
Merge branch 'linux_next' of git://git./linux/kernel/git/mchehab/linux-edac
Pull EDAC driver updates from Mauro Carvalho Chehab:
- sb_edac: add support for Ivy Bridge support
- cell_edac: add a missing of_node_put() call
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
cell_edac: fix missing of_node_put
sb_edac: add support for Ivy Bridge
sb_edac: avoid decoding the same error multiple times
sb_edac: rename mci_bind_devs()
sb_edac: enable multiple PCI id tables to be used
sb_edac: rework sad_pkg
sb_edac: allow different interleave lists
sb_edac: allow different dram_rule arrays
sb_edac: isolate TOHM retrieval
sb_edac: rename pci_br
sb_edac: isolate TOLM retrieval
sb_edac: make RANK_CFG_A value part of sbridge_info
Linus Torvalds [Mon, 18 Nov 2013 22:50:17 +0000 (14:50 -0800)]
Merge tag 'edac_for_3.13' of git://git./linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Following up on last week's discussion, here's my part of the EDAC
pile, highlights in the signed tag.
The last two patches have a date from just now because I've just
applied them to the tree after Johannes sent them to me earlier. I
decided to forward them now because they're trivial.
There's a third one for MPC85xx which adds PCIe error interrupt
support but since it is not so trivial and hasn't seen any linux-next
time, I'm deferring it to 3.14
EDAC update highlights:
- Support for Calxeda ECX-2000 memory controller, from Robert Richter
- Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
Herring and Robert Richter
- New maintainer for Freescale's MPC85xx EDAC driver: Johannes
Thumshirn"
* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac/85xx: Remove mpc85xx_pci_err_remove
EDAC: Add edac-mpc85xx driver to MAINTAINERS
edac, highbank: Moving error injection to sysfs for edac
edac, highbank: Add MAINTAINERS entry
edac: Unify reporting of device info for device, mc and pci
edac, highbank: Improve and unify naming
edac, highbank: Add Calxeda ECX-2000 support
ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
edac, highbank: Fix interrupt setup of mem and l2 controller
Linus Torvalds [Mon, 18 Nov 2013 22:47:30 +0000 (14:47 -0800)]
Merge tag 'mmc-updates-for-3.13-rc1' of git://git./linux/kernel/git/cjb/mmc
Pull MMC updates from Chris Ball:
"MMC highlights for 3.13:
Core:
- Improve runtime PM support, remove mmc_{suspend,resume}_host().
- Add MMC_CAP_RUNTIME_RESUME, for delaying MMC resume until we're
outside of the resume sequence (in runtime_resume) to decrease
system resume time.
Drivers:
- dw_mmc: Support HS200 mode.
- sdhci-eshdc-imx: Support SD3.0 SDR clock tuning, DDR on IMX6.
- sdhci-pci: Add support for Intel Clovertrail and Merrifield"
* tag 'mmc-updates-for-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (108 commits)
mmc: wbsd: Silence compiler warning
mmc: core: Silence compiler warning in __mmc_switch
mmc: sh_mmcif: Convert to clk_prepare|unprepare
mmc: sh_mmcif: Convert to PM macros when defining dev_pm_ops
mmc: dw_mmc: exynos: Revert the sdr_timing assignment
mmc: sdhci: Avoid needless loop while handling SDIO interrupts in sdhci_irq
mmc: core: Add MMC_CAP_RUNTIME_RESUME to resume at runtime_resume
mmc: core: Improve runtime PM support during suspend/resume for sd/mmc
mmc: core: Remove redundant mmc_power_up|off at runtime callbacks
mmc: Don't force card to active state when entering suspend/shutdown
MIPS: db1235: Don't use MMC_CLKGATE
mmc: core: Remove deprecated mmc_suspend|resume_host APIs
mmc: mmci: Move away from using deprecated APIs
mmc: via-sdmmc: Move away from using deprecated APIs
mmc: tmio: Move away from using deprecated APIs
mmc: sh_mmcif: Move away from using deprecated APIs
mmc: sdricoh_cs: Move away from using deprecated APIs
mmc: rtsx: Remove redundant suspend and resume callbacks
mmc: wbsd: Move away from using deprecated APIs
mmc: pxamci: Remove redundant suspend and resume callbacks
...
Ajit Khaparde [Mon, 18 Nov 2013 16:44:37 +0000 (10:44 -0600)]
be2net: Delete secondary unicast MAC addresses during be_close
Secondary unicast MAC addresses will get deleted only when the interface is UP.
When the interface is DOWN, though these secondary MAC addresses are unusable
and awaiting to be deleted, cause the firmware to believe that they are being used.
If the user intends to set a MAC address as primary MAC from one of these
secondary MAC addresses, the firmware returns a MAC address Collision error.
Delete these secondary MAC addresses during be_close.
The secondary MAC addresses list will be refreshed during interface open anyway.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Mon, 18 Nov 2013 16:44:24 +0000 (10:44 -0600)]
be2net: Fix unconditional enabling of Rx interface options
The driver currently requests the firmware to enable rx_interface options
without considering if the interface was created with that capability.
This could cause commands to firmware to fail.
To avoid this, enable only those options on an interface if the interface
was created with that capability.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhi Yong Wu [Mon, 18 Nov 2013 13:19:27 +0000 (21:19 +0800)]
net, virtio_net: replace the magic value
It is more appropriate to use # of queue pairs currently used by
the driver instead of a magic value.
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 18 Nov 2013 06:07:45 +0000 (07:07 +0100)]
ping: prevent NULL pointer dereference on write to msg_name
A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 18 Nov 2013 20:45:51 +0000 (15:45 -0500)]
Merge branch 'bnx2x'
Yuval Mintz says:
====================
bnx2x: Bug fixes patch series
This series contains several fixes, relating either to SR-IOV flows
or to critical sections protected by the rtnl lock.
Please consider applying these patches to `net'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 17 Nov 2013 06:59:29 +0000 (08:59 +0200)]
bnx2x: Prevent "timeout waiting for state X"
Current driver release rtnl lock in between DCB re-configuration.
As a result, other flows (e.g., mtu config) may enter in between and fail
due to halted tx path for dcb configuration.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 17 Nov 2013 06:59:28 +0000 (08:59 +0200)]
bnx2x: prevent CFC attention
During VF load, prior to sending messages on HW channel to PF the VF
checks its bulletin board to see whether the PF indicated it has closed;
If a closed PF is encountered, the VF skips sending the message.
Due to incorrect return values, there's a possible scenario in which the VF
finishes loading "successfully", while the PF hasn't actually fully configured
FW/HW for the VFs supposed configuration.
Once VF tries to send Tx packets, HW will raise an attention (and FW possibly
will start treat the VF as malicious).
The patch fails the loading process in such a scenario.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 17 Nov 2013 06:59:27 +0000 (08:59 +0200)]
bnx2x: Prevent panic during DMAE timeout
If chip enters a recovery flow just after the driver issues a DMAE request
the DMAE will timeout. Current code will cause a bnx2x_panic() as a result,
which means interface will no longer be usable (regardless of the recovery
results), as bnx2x_panic() is irreversible for the driver.
As this is a possible flow, the panic should be reached only when driver
is compiled with STOP_ON_ERROR.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 17 Nov 2013 06:59:26 +0000 (08:59 +0200)]
bnx2x: Clean the sp rtnl task upon unload
While unloading, bnx2x needs to clean the sp_rtnl_state to prevent
configuration made before the unload to be applied afterwards with
stale values.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Sat, 16 Nov 2013 20:17:24 +0000 (15:17 -0500)]
ipv6: Fix inet6_init() cleanup order
Commit
6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67
net: ipv6: Add IPv6 support to the ping socket
introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sat, 17 Aug 2013 20:58:42 +0000 (13:58 -0700)]
watchdog: w83627hf: Use helper functions to access superio registers
Use helper functions named similar to other drivers to access
superio registers.
Request memory region only when needed, and use request_muxed_region().
This lets other devices (hwmon, gpio) use the same region.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Guenter Roeck [Sat, 17 Aug 2013 20:58:41 +0000 (13:58 -0700)]
watchdog: w83627hf: Enable watchdog device only if not already enabled
There is no need to enable the watchdog device if it is already enabled.
Also, when enabling the watchdog device, only set the watchdog device
enable bit and do not touch other bits; depending on the chip type,
those bits may enable other functionality.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Guenter Roeck [Sat, 17 Aug 2013 20:58:40 +0000 (13:58 -0700)]
watchdog: w83627hf: Enable watchdog only once
It is unnecessary to enable the logical device and WDT0 each time
the watchdog is accessed. Do it only once during initialization.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Guenter Roeck [Tue, 29 Oct 2013 02:43:57 +0000 (19:43 -0700)]
watchdog: w83627hf: Convert to watchdog infrastructure
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Johannes Berg [Mon, 18 Nov 2013 19:54:58 +0000 (20:54 +0100)]
genetlink: rename shadowed variable
Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 18 Nov 2013 03:20:45 +0000 (04:20 +0100)]
inet: prevent leakage of uninitialized memory to user in recv syscalls
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.
If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Sat, 16 Nov 2013 02:52:08 +0000 (00:52 -0200)]
net: ipv6: ndisc: Fix warning when CONFIG_SYSCTL=n
When CONFIG_SYSCTL=n the following build warning happens:
net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label]
The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the
'ifdef CONFIG_SYSCTL' block.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Mon, 18 Nov 2013 13:11:42 +0000 (22:11 +0900)]
bio: fix argument of __bio_add_page() for max_sectors > 0xffff
The data type of max_sectors and max_hw_sectors in queue settings are
unsigned int. But these values are passed to __bio_add_page() as an
argument whose data type is unsigned short. In the worst case such as
max_sectors is 0x10000, bio_add_page() can't add a page and IOs can't
proceed.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Wei Yongjun [Mon, 18 Nov 2013 13:03:08 +0000 (21:03 +0800)]
i2c: bcm-kona: fix error return code in bcm_kona_i2c_probe()
Fix to return a negative error code from the bus speed parse
error handling case instead of 0.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Tim Kryger <tim.kryger@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Ma JieYue [Fri, 15 Nov 2013 04:26:13 +0000 (12:26 +0800)]
xen-netfront: fix missing rx_refill_timer when allocate memory failed
There was a bug in xennet_alloc_rx_buffers, when allocating page or
sk_buff failed, and at the same time rx_batch queue not empty,
the rx_refill_timer timer won't be scheduled. If finally the remaining
request buffers in rx ring less than what backend driver expected,
the backend driver would think of rx ring as full and start dropping packets.
In such situation, there is no way for the netfront driver to recover
automatically, so that the device can not work properly.
The patch fixes the problem by always scheduling rx_refill_timer timer when
alloc_page or __netdev_alloc_skb fails, no matter whether rx_batch queue is
empty or not. It ensures that the rx ring request buffers will finally meet
the backend needs.
Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
H. Peter Anvin [Fri, 15 Nov 2013 05:43:47 +0000 (21:43 -0800)]
Revert "init/Kconfig: add option to disable kernel compression"
This reverts commit
69f0554ec261fd686ac7fa1c598cc9eb27b83a80.
This patch breaks randconfig on at least the x86-64 architecture, and
most likely on others. There is work underway to support uncompressed
kernels in a generic way, but it looks like it will amount to
rewriting the support from scratch; see the LKML thread in the Link:
for info.
Therefore, revert this change and wait for the fix.
Reported-by: Pavel Roskin <proski@gnu.org>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131113113418.167b8ffd@IRBT4585
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Thumshirn [Sun, 17 Nov 2013 18:25:14 +0000 (19:25 +0100)]
edac/85xx: Remove mpc85xx_pci_err_remove
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.
Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Johannes Thumshirn [Sun, 17 Nov 2013 18:25:12 +0000 (19:25 +0100)]
EDAC: Add edac-mpc85xx driver to MAINTAINERS
Add drivers/edac/mpc85xx_edac.[ch] to MAINTAINERS file and me as
maintainer.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Victor Kamensky [Sat, 16 Nov 2013 00:01:05 +0000 (02:01 +0200)]
watchdog: omap_wdt: raw read and write endian fix
All OMAP IP blocks expect LE data, but CPU may operate in BE mode.
Need to use endian neutral functions to read/write h/w registers.
I.e instead of __raw_read[lw] and __raw_write[lw] functions code
need to use read[lw]_relaxed and write[lw]_relaxed functions.
If the first simply reads/writes register, the second will byteswap
it if host operates in BE mode.
Changes are trivial sed like replacement of __raw_xxx functions
with xxx_relaxed variant.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Uwe Kleine-König [Mon, 11 Nov 2013 20:33:44 +0000 (21:33 +0100)]
watchdog: sirf: don't depend on dummy value of CLOCK_TICK_RATE
As CSR SiRF is converted to multi platform CLOCK_TICK_RATE is a dummy
value that seems to match the right value is used.
(arch/arm/mach-prima2/include/mach/timex.h which defined CLOCK_TICK_RATE
to
1000000 was removed in commit
cf82e0e (ARM: sirf: enable
multiplatform support); marco used the same file.)
To not depend on that dummy value use a local #define instead.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Dan Carpenter [Fri, 8 Nov 2013 09:24:19 +0000 (01:24 -0800)]
watchdog: pcwd_usb: overflow in usb_pcwd_send_command()
We changed "buf" from being an array of 6 chars to being a pointer this
sizeof(buf) needs to be updated as well.
Fixes:
2ddb8089a7e5 ('watchdog: pcwd_usb: Use allocated buffer for usb_control_msg')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>