David Sterba [Fri, 26 Feb 2016 14:38:32 +0000 (15:38 +0100)]
Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:31 +0000 (15:38 +0100)]
Merge branch 'foreign/josef/space-updates' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:30 +0000 (15:38 +0100)]
Merge branch 'foreign/zhaolei/reada' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:30 +0000 (15:38 +0100)]
Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:29 +0000 (15:38 +0100)]
Merge branch 'dev/rename-keys' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:28 +0000 (15:38 +0100)]
Merge branch 'dev/gfp-flags' into for-chris-4.6
David Sterba [Fri, 26 Feb 2016 14:38:28 +0000 (15:38 +0100)]
Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6
# Conflicts:
# fs/btrfs/file.c
Liu Bo [Fri, 17 Jul 2015 08:49:19 +0000 (16:49 +0800)]
Btrfs: fix lockdep deadlock warning due to dev_replace
Xfstests btrfs/011 complains about a deadlock warning,
[ 1226.649039] =========================================================
[ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
[ 1226.649039] 4.1.0+ #270 Not tainted
[ 1226.649039] ---------------------------------------------------------
[ 1226.652955] kswapd0/46 just changed the state of lock:
[ 1226.652955] (&delayed_node->mutex){+.+.-.}, at: [<
ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
[ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[ 1226.652955] (&fs_info->dev_replace.lock){+.+.+.}
and interrupts could create inverse lock ordering between them.
[ 1226.652955]
other info that might help us debug this:
[ 1226.652955] Chain exists of:
&delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
[ 1226.652955] Possible interrupt unsafe locking scenario:
[ 1226.652955] CPU0 CPU1
[ 1226.652955] ---- ----
[ 1226.652955] lock(&fs_info->dev_replace.lock);
[ 1226.652955] local_irq_disable();
[ 1226.652955] lock(&delayed_node->mutex);
[ 1226.652955] lock(&found->groups_sem);
[ 1226.652955] <Interrupt>
[ 1226.652955] lock(&delayed_node->mutex);
[ 1226.652955]
*** DEADLOCK ***
Commit
084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest.") tried
to fix a similar one that has the exactly same warning, but with that, we still
run to this.
The above lock chain comes from
btrfs_commit_transaction
->btrfs_run_delayed_items
...
->__btrfs_update_delayed_inode
...
->__btrfs_cow_block
...
->find_free_extent
->cache_block_group
->load_free_space_cache
->btrfs_readpages
->submit_one_bio
...
->__btrfs_map_block
->btrfs_dev_replace_lock
However, with high memory pressure, tasks which hold dev_replace.lock can
be interrupted by kswapd and then kswapd is intended to release memory occupied
by superblock, inodes and dentries, where we may call evict_inode, and it comes
to
[ 1226.652955] [<
ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
[ 1226.652955] [<
ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
[ 1226.652955] [<
ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700
delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
to a ABBA deadlock.
To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
things are simpler here since we only needs read's spinlock to blocking lock.
With this, btrfs/011 no more produces warnings in dmesg.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 26 Jan 2016 14:35:38 +0000 (09:35 -0500)]
Btrfs: check reserved when deciding to background flush
We will sometimes start background flushing the various enospc related things
(delayed nodes, delalloc, etc) if we are getting close to reserving all of our
available space. We don't want to do this however when we are actually using
this space as it causes unneeded thrashing. We currently try to do this by
checking bytes_used >= thresh, but bytes_used is only part of the equation, we
need to use bytes_reserved as well as this represents space that is very likely
to become bytes_used in the future.
My tracing tool will keep count of the number of times we kick off the async
flusher, the following are counts for the entire run of generic/027
No Patch Patch
avg: 5385 5009
median: 5500 4916
We skewed lower than the average with my patch and higher than the average with
the patch, overall it cuts the flushing from anywhere from 5-10%, which in the
case of actual ENOSPC is quite helpful. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 13 Jan 2016 18:21:20 +0000 (13:21 -0500)]
Btrfs: add transaction space reservation tracepoints
There are a few places where we add to trans->bytes_reserved but don't have the
corresponding trace point. With these added my tool no longer sees transaction
leaks.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 13 Jan 2016 16:48:06 +0000 (11:48 -0500)]
Btrfs: fix truncate_space_check
truncate_space_check is using btrfs_csum_bytes_to_leaves() but forgetting to
multiply by nodesize so we get an actual byte count. We need a tracepoint here
so that we have the matching reserve for the release that will come later. Also
add a comment to make clear what the intent of truncate_space_check is.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Mon, 11 Jan 2016 22:28:38 +0000 (17:28 -0500)]
Btrfs: change how we update the global block rsv
I'm writing a tool to visualize the enospc system in order to help debug enospc
bugs and I found weird data and ran it down to when we update the global block
rsv. We add all of the remaining free space to the block rsv, do a trace event,
then remove the extra and do another trace event. This makes my visualization
look silly and is unintuitive code as well. Fix this stuff to only add the
amount we are missing, or free the amount we are missing. This is less clean to
read but more explicit in what it is doing, as well as only emitting events for
values that make sense. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 14 Jan 2016 10:39:00 +0000 (18:39 +0800)]
btrfs: reada: ignore creating reada_extent for a non-existent device
For a non-existent device, old code bypasses adding it in dev's reada
queue.
And to solve problem of unfinished waitting in raid5/6,
commit
5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for
raid5/6 degraded mounting")
adding an exception for the first stripe, in short, the first
stripe will always be processed whether the device exists or not.
Actually we have a better way for the above request: just bypass
creation of the reada_extent for non-existent device, it will make
code simple and effective.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Tue, 26 Jan 2016 10:42:40 +0000 (18:42 +0800)]
btrfs: reada: avoid undone reada extents in btrfs_reada_wait
Reada background works is not designed to finish all jobs
completely, it will break in following case:
1: When a device reaches workload limit (MAX_IN_FLIGHT)
2: Total reads reach max limit (10000)
3: All devices don't have queued more jobs, often happened in DUP case
And if all background works exit with remaining jobs,
btrfs_reada_wait() will wait indefinetelly.
Above problem is rarely happened in old code, because:
1: Every work queues 2x new works
So many works reduced chances of undone jobs.
2: One work will continue 10000 times loop in case of no-jobs
It reduced no-thread window time.
But after we fixed above case, the "undone reada extents" frequently
happened.
Fix:
Check to ensure we have at least one thread if there are undone jobs
in btrfs_reada_wait().
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 7 Jan 2016 10:38:48 +0000 (18:38 +0800)]
btrfs: reada: limit max works count
Reada creates 2 works for each level of tree recursively.
In case of a tree having many levels, the number of created works
is 2^level_of_tree.
Actually we don't need so many works in parallel, this patch limits
max works to BTRFS_MAX_MIRRORS * 2.
The per-fs works_counter will be also used for btrfs_reada_wait() to
check is there are background workers.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Tue, 12 Jan 2016 06:58:39 +0000 (14:58 +0800)]
btrfs: reada: simplify dev->reada_in_flight processing
No need to decrease dev->reada_in_flight in __readahead_hook()'s
internal and reada_extent_put().
reada_extent_put() have no chance to decrease dev->reada_in_flight
in free operation, because reada_extent have additional refcnt when
scheduled to a dev.
We can put inc and dec operation for dev->reada_in_flight to one
place instead to make logic simple and safe, and move useless
reada_extent->scheduled_for to a bool flag instead.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 14:28:51 +0000 (22:28 +0800)]
btrfs: reada: Fix a debug code typo
Remove one copy of loop to fix the typo of iterate zones.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 14:20:59 +0000 (22:20 +0800)]
btrfs: reada: Jump into cleanup in direct way for __readahead_hook()
Current code set nritems to 0 to make for_loop useless to bypass it,
and set generation's value which is not necessary.
Jump into cleanup directly is better choise.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 14:46:45 +0000 (22:46 +0800)]
btrfs: reada: Use fs_info instead of root in __readahead_hook's argument
What __readahead_hook() need exactly is fs_info, no need to convert
fs_info to root in caller and convert back in __readahead_hook()
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 14:09:05 +0000 (22:09 +0800)]
btrfs: reada: Pass reada_extent into __readahead_hook directly
reada_start_machine_dev() already have reada_extent pointer, pass
it into __readahead_hook() directly instead of search radix_tree
will make code run faster.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 13:07:17 +0000 (21:07 +0800)]
btrfs: reada: move reada_extent_put to place after __readahead_hook()
We can't release reada_extent earlier than __readahead_hook(), because
__readahead_hook() still need to use it, it is necessary to hode a refcnt
to avoid it be freed.
Actually it is not a problem after my patch named:
Avoid many times of empty loop
It make reada_extent in above line include at least one reada_extctl,
which keeps additional one refcnt for reada_extent.
But we still need this patch to make the code in pretty logic.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 12:30:00 +0000 (20:30 +0800)]
btrfs: reada: Remove level argument in severial functions
level is not used in severial functions, remove them from arguments,
and remove relative code for get its value.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 10:48:54 +0000 (18:48 +0800)]
btrfs: reada: bypass adding extent when all zone failed
When failed adding all dev_zones for a reada_extent, the extent
will have no chance to be selected to run, and keep in memory
for ever.
We should bypass this extent to avoid above case.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 10:15:47 +0000 (18:15 +0800)]
btrfs: reada: add all reachable mirrors into reada device list
If some device is not reachable, we should bypass and continus addingb
next, instead of break on bad device.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 14:57:52 +0000 (22:57 +0800)]
btrfs: reada: Move is_need_to_readahead contition earlier
Move is_need_to_readahead contition earlier to avoid useless loop
to get relative data for readahead.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Thu, 31 Dec 2015 08:57:21 +0000 (16:57 +0800)]
btrfs: reada: Avoid many times of empty loop
We can see following loop(10000 times) in trace_log:
[ 75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
[ 75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
[ 75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
[ 75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
...(10000 times)
[ 124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
[ 124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt
ffff88003741e0c0 1 -> 2
[ 124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re =
ffff88003741e0c0, refcnt = 2 -> 1
Reason:
If more than one user trigger reada in same extent, the first task
finished setting of reada data struct and call reada_start_machine()
to start, and the second task only add a ref_count but have not
add reada_extctl struct completely, the reada_extent can not finished
all jobs, and will be selected in __reada_start_machine() for 10000
times(total times in __reada_start_machine()).
Fix:
For a reada_extent without job, we don't need to run it, just return
0 to let caller break.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Fri, 18 Dec 2015 13:56:08 +0000 (21:56 +0800)]
btrfs: reada: Add missed segment checking in reada_find_zone
In rechecking zone-in-tree, we still need to check zone include
our logical address.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Fri, 18 Dec 2015 13:48:48 +0000 (21:48 +0800)]
btrfs: reada: reduce additional fs_info->reada_lock in reada_find_zone
We can avoid additional locking-acquirment and one pair of
kref_get/put by combine two condition.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zhao Lei [Fri, 18 Dec 2015 13:33:05 +0000 (21:33 +0800)]
btrfs: reada: Fix in-segment calculation for reada
reada_zone->end is end pos of segment:
end = start + cache->key.offset - 1;
So we need to use "<=" in condition to judge is a pos in the
segment.
The problem happened rearly, because logical pos rarely pointed
to last 4k of a blockgroup, but we need to fix it to make code
right in logic.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 14 Feb 2016 21:05:20 +0000 (13:05 -0800)]
Linux 4.5-rc4
Linus Torvalds [Sun, 14 Feb 2016 20:47:45 +0000 (12:47 -0800)]
Merge tag 'char-misc-4.5-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are 3 fixes for some reported issues. Two nvmem driver fixes,
and one mei fix. All have been in linux-next just fine"
* tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
nvmem: qfprom: Specify LE device endianness
nvmem: core: return error for non word aligned access
mei: validate request value in client notify request ioctl
Linus Torvalds [Sun, 14 Feb 2016 20:34:53 +0000 (12:34 -0800)]
Merge tag 'driver-core-4.5-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one driver core, well klist, fix for 4.5-rc4.
It fixes a problem found in the scsi device list traversal that
probably also could be triggered by other subsystems.
The fix has been in linux-next for a while with no reported problems"
* tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
klist: fix starting point removed bug in klist iterators
Linus Torvalds [Sun, 14 Feb 2016 20:29:59 +0000 (12:29 -0800)]
Merge tag 'tty-4.5-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
that resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and
all have been in linux-next for a while"
* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "8250: uniphier: allow modular build with 8250 console"
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
tty: Add support for PCIe WCH382 2S multi-IO card
serial/omap: mark wait_for_xmitr as __maybe_unused
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
8250: uniphier: allow modular build with 8250 console
tty: Drop krefs for interrupted tty lock
Linus Torvalds [Sun, 14 Feb 2016 20:24:28 +0000 (12:24 -0800)]
Merge tag 'usb-4.5-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull PHY fixes from Greg KH:
"Here are a couple of PHY driver fixes for 4.5-rc4.
A few small phy issues. All have been in linux-next with no reported
issues"
* tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
phy: twl4030-usb: Relase usb phy on unload
phy: core: fix wrong err handle for phy_power_on
phy: Restrict phy-hi6220-usb to HiSilicon arm64
Linus Torvalds [Sun, 14 Feb 2016 20:07:55 +0000 (12:07 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
"Another round of fixes for the perf tooling side:
- Prevent a NULL pointer dereference in tracepoint error handling
- Fix a thread handling bug in the intel_pt error handling code
- Search both .eh_frame and .debug_frame sections as toolchains seem
to have random choices of storing the CFI information
- Fix the perf state interval output values, which got broken when
fixing the overall output"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf stat: Fix interval output values
perf probe: Search both .eh_frame and .debug_frame sections for probe location
perf tools: Fix thread lifetime related segfaut in intel_pt
perf tools: tracepoint_error() can receive e=NULL, robustify it
Linus Torvalds [Sun, 14 Feb 2016 20:02:05 +0000 (12:02 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull lockdep fix from Thomas Gleixner:
"A single fix for the stack trace caching logic in lockdep, where the
duplicate avoidance managed to store no back trace at all"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix stack trace caching logic
Linus Torvalds [Sun, 14 Feb 2016 19:57:24 +0000 (11:57 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix preventing a 32bit overflow in timespec/val to cputime
conversions on 32bit machines"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
Linus Torvalds [Sun, 14 Feb 2016 19:49:30 +0000 (11:49 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irqchip fixes from Thomas Gleixner:
"Another set of ARM SoC related irqchip fixes:
- Plug a memory leak in gicv3-its
- Limit features to the root gic interrupt controller
- Add a missing barrier in the gic-v3 IAR access
- Another compile test fix for sun4i"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
irqchip/gic: Only set the EOImodeNS bit for the root controller
irqchip/gic: Only populate set_affinity for the root controller
irqchip/gicv3-its: Fix memory leak in its_free_tables()
irqchip/sun4i: Fix compilation outside of arch/arm
Linus Torvalds [Sun, 14 Feb 2016 18:50:26 +0000 (10:50 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two small fixlets for x86:
- Prevent a KASAN false positive in thread_saved_pc()
- Fix a 32-bit truncation problem in the x86 numa code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels
x86: Fix KASAN false positives in thread_saved_pc()
Linus Torvalds [Sun, 14 Feb 2016 18:49:01 +0000 (10:49 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Here's the first round of MIPS fixes after the merge window:
- Detect Octeon III's PCI correctly.
- Fix return value of the MT7620 probing function.
- Wire up the copy_file_range syscall.
- Fix 64k page support on 32 bit kernels.
- Fix the early Coherency Manager probe.
- Allow only hardware-supported page sizes to be selected for R6000.
- Fix corner cases for the RDHWR nstruction emulation on old hardware.
- Fix FPU handling corner cases.
- Remove stale entry for BCM33xx from the MAINTAINERS file.
- 32 and 64 bit ELF headers are different, handle them correctly"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
mips: Differentiate between 32 and 64 bit ELF header
MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III
MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe()
MIPS: Fix early CM probing
MIPS: Wire up copy_file_range syscall.
MIPS: Fix 64k page support for 32 bit kernels.
MIPS: R6000: Don't allow 64k pages for R6000.
MIPS: traps.c: Correct microMIPS RDHWR emulation
MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler
MAINTAINERS: Remove stale entry for BCM33xx chips
MIPS: Fix FPU disable with preemption
MIPS: Properly disable FPU in start_thread()
MIPS: Fix buffer overflow in syscall_get_arguments()
Linus Torvalds [Sun, 14 Feb 2016 18:46:47 +0000 (10:46 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A couple of ARM fixes from Linus for the ICST clock generator code"
[ "Linus" here is Linus Walleij. Name-stealer.
Linus "there can be only one" Torvalds ]
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8519/1: ICST: try other dividends than 1
ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
Linus Torvalds [Sun, 14 Feb 2016 18:40:21 +0000 (10:40 -0800)]
Merge branch 'component' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull component helper fixes from Russell King:
"A few fixes for problems people have encountered with the recent
update to the component helpers"
* 'component' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
component: remove device from master match list on failed add
component: Detach components when deleting master struct
component: fix crash on x86_64 with hda audio drivers
Linus Torvalds [Sun, 14 Feb 2016 01:35:23 +0000 (17:35 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull more rdma fixes from Doug Ledford:
"I think we are getting pretty close to done now. There are four
one-off fixes in this update:
- fix ipoib multicast joins
- fix mlx4 error handling
- fix mlx5 size computation
- fix a thinko in core code"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Fix RC transport send queue overhead computation
IB/ipoib: fix for rare multicast join race condition
IB/core: Fix reading capability mask of the port info class
net/mlx4: fix some error handling in mlx4_multi_func_init()
Linus Torvalds [Sun, 14 Feb 2016 00:39:27 +0000 (16:39 -0800)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"This includes the long awaited series to address a set of bugs around
active I/O remote-port LUN_RESET, as well as properly handling this
same case with concurrent fabric driver session disconnect ->
reconnect.
Note this set of LUN_RESET bug-fixes has been surviving extended
testing on both v4.5-rc1 and v3.14.y code over the last weeks, and is
CC'ed for stable as it's something folks using multiple ESX connected
hosts with slow backends can certainly trigger.
The highlights also include:
- Fix WRITE_SAME/DISCARD emulation 4k sector conversion in
target/iblock (Mike Christie)
- Fix TMR abort interaction and AIO type TMR response in qla2xxx
target (Quinn Tran + Swapnil Nagle)
- Fix >= v3.17 stale descriptor pointer regression in qla2xxx target
(Quinn Tran)
- Fix >= v4.5-rc1 return regression with unmap_zeros_data_store new
configfs store handler (nab)
- Add CPU affinity flag + convert qla2xxx to use bit (Quinn + HCH +
Bart)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
target/transport: add flag to indicate CPU Affinity is observed
target: Fix incorrect unmap_zeroes_data_store return
qla2xxx: Use ATIO type to send correct tmr response
qla2xxx: Fix stale pointer access.
target/user: Fix cast from pointer to phys_addr_t
target: Drop legacy se_cmd->task_stop_comp + REQUEST_STOP usage
target: Fix race with SCF_SEND_DELAYED_TAS handling
target: Fix remote-port TMR ABORT + se_cmd fabric stop
target: Fix TAS handling for multi-session se_node_acls
target: Fix LUN_RESET active TMR descriptor handling
target: Fix LUN_RESET active I/O handling for ACK_KREF
qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM
qla2xxx: Fix warning reported by static checker
target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
Linus Torvalds [Sat, 13 Feb 2016 21:05:56 +0000 (13:05 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal management fixes from Eduardo Valentin:
"Specifics in this pull request:
- Compilation fixes on SPEAR, and U8500 thermal drivers.
- RCAR thermal driver now recognizes OF-thermal based thermal zones.
- Small code rework on OF-thermal.
- These change have been CI tested using KernelCI bot [1,2]. \o/
I am taking over on Rui's behalf while he is out. Happy New Chinese
Year!
[1] - https://kernelci.org/build/evalenti/kernel/
v4.5-rc3-16-ga53b8394ec3c/
[2] - https://kernelci.org/boot/all/job/evalenti/kernel/
v4.5-rc3-16-ga53b8394ec3c/"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: cpu_cooling: fix out of bounds access in time_in_idle
thermal: allow u8500-thermal driver to be a module
thermal: allow spear-thermal driver to be a module
thermal: spear: use __maybe_unused for PM functions
thermal: rcar: enable to use thermal-zone on DT
thermal: of: use for_each_available_child_of_node for child iterator
Linus Torvalds [Sat, 13 Feb 2016 21:04:47 +0000 (13:04 -0800)]
Merge tag 'sound-fix-4.5-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull another sound fix from Takashi Iwai:
"This contains a fix for the double-free of usb-audio MIDI device at
probe failure"
* tag 'sound-fix-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: avoid freeing umidi object twice
Linus Torvalds [Sat, 13 Feb 2016 16:18:21 +0000 (08:18 -0800)]
Merge tag 'arc-4.5-fixes' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"I've been sitting on some of these fixes for a while.
- Corner case of returning to delay slot from interrupt
- Changing default interrupt prioiry level
- Kconfig'ize support for super pages
- Other minor fixes"
* tag 'arc-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: Introduce explicit super page size support
ARCv2: intc: Allow interruption by lowest priority interrupt
ARCv2: Check for LL-SC livelock only if LLSC is enabled
ARC: shrink cpuinfo by not saving full timer BCR
ARCv2: clocksource: Rename GRTC -> GFRC ...
ARCv2: STAR
9000950267: Handle return from intr to Delay Slot #2
Andrey Konovalov [Sat, 13 Feb 2016 08:08:06 +0000 (11:08 +0300)]
ALSA: usb-audio: avoid freeing umidi object twice
The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.
Found by KASAN.
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Fri, 12 Feb 2016 23:31:22 +0000 (15:31 -0800)]
Merge tag 'pci-v4.5-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are some Renesas binding updates for PCI host controllers, a
Broadcom fix for a regression we added in v4.5-rc1, and a fix for an
AER use-after-free problem that can cause memory corruption.
Summary:
AER:
Flush workqueue on device remove to avoid use-after-free (Sebastian Andrzej Siewior)
Broadcom iProc host bridge driver:
Allow multiple devices except on PAXC (Ray Jui)
Renesas R-Car host bridge driver:
Add gen2 device tree support for r8a7793 (Simon Horman)
Add device tree support for r8a7793 (Simon Horman)"
* tag 'pci-v4.5-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: rcar: Add device tree support for r8a7793
PCI: rcar: Add gen2 device tree support for r8a7793
PCI: iproc: Allow multiple devices except on PAXC
PCI/AER: Flush workqueue on device remove to avoid use-after-free
Linus Torvalds [Fri, 12 Feb 2016 21:12:27 +0000 (13:12 -0800)]
Merge branch 'akpm'(patches from Andrew)
Merge fixes from Andrew Morton:
"10 fixes"
The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window. Andrew thought it should go in now. I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
mm: fix pfn_t vs highmem
kernel/locking/lockdep.c: convert hash tables to hlists
mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
mm,thp: khugepaged: call pte flush at the time of collapse
mm/backing-dev.c: fix error path in wb_init()
mm, dax: check for pmd_none() after split_huge_pmd()
vsprintf: kptr_restrict is okay in IRQ when 2
mm: fix filemap.c kernel doc warning
ubsan: cosmetic fix to Kconfig text
Leon Romanovsky [Thu, 11 Feb 2016 19:09:57 +0000 (21:09 +0200)]
IB/mlx5: Fix RC transport send queue overhead computation
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.
The ATOMIC and UMR segments can't coexist together, so chose maximum out
of them.
The commit
9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead
computation") was intended to update RC transport as commit messages
states, but added the code to UC transport.
Fixes:
9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead computation")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Estrin [Thu, 11 Feb 2016 21:30:51 +0000 (16:30 -0500)]
IB/ipoib: fix for rare multicast join race condition
A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):
[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<
ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225] [<
ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024] [<
ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149] [<
ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754] [<
ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088] [<
ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737] [<
ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
rec.qkey = priv->broadcast->mcmember.qkey;
^^^^^^^
.....
}
Proposed patch should prevent multicast join task to continue
if link state change is detected.
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Fri, 12 Feb 2016 17:48:55 +0000 (09:48 -0800)]
Merge tag 'mmc-v4.5-rc2' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.5 rc4.
MMC core:
- Fix an sysfs ABI regression
- Return an error in a specific error path dealing with mmc ioctls
MMC host:
- sdhci-pci|acpi: Fix card detect race for Intel BXT/APL
- sh_mmcif: Correct TX DMA channel allocation
- mmc_spi: Fix error handling for dma mapping errors
- sdhci-of-at91: Fix an unbalance issue for the runtime PM usage count
- pxamci: Fix the device-tree probe deferral path
- pxamci: Fix read-only GPIO polarity"
* tag 'mmc-v4.5-rc2' of git://git.linaro.org/people/ulf.hansson/mmc:
Revert "mmc: block: don't use parameter prefix if built as module"
mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL
mmc: sdhci-pci: Fix card detect race for Intel BXT/APL
mmc: sdhci: Allow override of get_cd() called from sdhci_request()
mmc: sdhci: Allow override of mmc host operations
mmc: sh_mmcif: Correct TX DMA channel allocation
mmc: block: return error on failed mmc_blk_get()
mmc: pxamci: fix the device-tree probe deferral path
mmc: mmc_spi: add checks for dma mapping error
mmc: sdhci-of-at91: fix pm runtime unbalanced issue in error path
mmc: pxamci: fix again read-only gpio detection polarity
Linus Torvalds [Fri, 12 Feb 2016 17:42:05 +0000 (09:42 -0800)]
Merge tag 'sound-4.5-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"In this rc, we've got more volume than previous rc, unsurprisingly;
the majority of updates in ASoC are about Intel drivers, and another
major changes are the continued plumbing of ALSA timer bugs revealed
by syzkaller fuzzer. Hopefully both settle down now.
Other than that, HD-audio received a couple of code fixes as well as
the usual quirks, and various small fixes are found for FireWire
devices, ASoC codecs and drivers"
* tag 'sound-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (50 commits)
ASoC: arizona: fref must be limited in pseudo-fractional mode
ASoC: sigmadsp: Fix missleading return value
ALSA: timer: Fix race at concurrent reads
ALSA: firewire-digi00x: Drop bogus const type qualifier on dot_scrt()
ALSA: hda - Fix bad dereference of jack object
ALSA: timer: Fix race between stop and interrupt
ALSA: timer: Fix wrong instance passed to slave callbacks
ASoC: Intel: Add module tags for common match module
ASoC: Intel: Load the atom DPCM driver only
ASoC: Intel: Create independent acpi match module
ASoC: Intel: Revert "ASoC: Intel: fix ACPI probe regression with Atom DPCM driver"
ALSA: dummy: Implement timer backend switching more safely
ALSA: hda - Fix speaker output from VAIO AiO machines
Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
ALSA: firewire-tascam: remove needless member for control and status message
ALSA: firewire-tascam: remove a flag for controller
ALSA: firewire-tascam: add support for FW-1804
ALSA: firewire-tascam: fix NULL pointer dereference when model identification fails
ALSA: hda - Fix static checker warning in patch_hdmi.c
ASoC: Intel: Skylake: Remove autosuspend delay
...
Linus Torvalds [Fri, 12 Feb 2016 17:39:34 +0000 (09:39 -0800)]
Merge tag 'fbdev-fixes-4.5' of git://git./linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- fix omap2plus_defconfig to enable omapfb as it was in v4.4
- ocfb: fix timings for margins
- s6e8ax0, da8xx-fb: fix compile warnings
- mmp: fix build failure caused by bad printk parameters
- imxfb: fix clock issue which kept the display off
* tag 'fbdev-fixes-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
video: fbdev: imxfb: Provide a reset mechanism
fbdev: mmp: print IRQ resource using %pR format string
fbdev: da8xx-fb: remove incorrect type cast
fbdev: s6e8ax0: avoid unused function warnings
ocfb: fix tgdel and tvdel timing parameters
ARM: omap2plus_defconfig: update display configs
Linus Torvalds [Fri, 12 Feb 2016 17:32:37 +0000 (09:32 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of seven fixes:
Two regressions in the new hisi_sas arm driver, a blacklist entry for
the marvell console which was causing a reset cascade without it, a
race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
driver, without which, it would prematurely return EIO and a couple of
fixes for the hyper-v storvsc driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
SCSI: Add Marvell Console to VPD blacklist
scsi_dh_rdac: always retry MODE SELECT on command lock violation
storvsc: Use the specified target ID in device lookup
storvsc: Install the storvsc specific timeout handler for FC devices
hisi_sas: fix v1 hw check for slot error
hisi_sas: add dependency for HAS_IOMEM
Linus Torvalds [Fri, 12 Feb 2016 17:27:31 +0000 (09:27 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm amd fixes from Dave Airlie:
"Been pretty quiet.
This is an amdgpu fixes pull from AMD, a bunch of powerplay stability
fixes, race fix, hibernate fix, and a possible circular locking fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (21 commits)
drm/amdgpu: fix issue with overlapping userptrs
drm/radeon: hold reference to fences in radeon_sa_bo_new
drm/amdgpu: remove unnecessary forward declaration
drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
drm/amdgpu: fix s4 resume
drm/amdgpu/cz: plumb pg flags through to powerplay
drm/amdgpu/tonga: plumb pg flags through to powerplay
drma/dmgpu: move cg and pg flags into shared headers
drm/amdgpu: remove unused cg defines
drm/amdgpu: add a cgs interface to fetch cg and pg flags
drm/amd/powerplay/tonga: disable vce pg
drm/amd/powerplay/tonga: disable uvd pg
drm/amd/powerplay/cz: disable vce pg
drm/amd/powerplay/cz: disable uvd pg
drm/amdgpu: be consistent with uvd cg flags
drm/amdgpu: clean up vce pg flags for cz/st
drm/amdgpu: handle vce pg flags properly
drm/amdgpu: handle uvd pg flags properly
drm/amdgpu/dpm/ci: switch over to the common pcie caps interface
drm/amdgpu/cik: don't mess with aspm if gpu is root bus
...
Linus Torvalds [Fri, 12 Feb 2016 17:24:48 +0000 (09:24 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull crypto fix from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
EVM: Use crypto_memneq() for digest comparisons
Linus Torvalds [Fri, 12 Feb 2016 17:21:28 +0000 (09:21 -0800)]
Merge branch 'for-linus-4.5' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This has a few fixes from Filipe, along with a readdir fix from Dave
that we've been testing for some time"
* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: properly set the termination value of ctx->pos in readdir
Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
Btrfs: remove no longer used function extent_read_full_page_nolock()
Btrfs: fix page reading in extent_same ioctl leading to csum errors
Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
Linus Torvalds [Fri, 12 Feb 2016 17:17:03 +0000 (09:17 -0800)]
Merge tag 'xfs-fixes-for-linus-4.5' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs fix from Dve Chinner:
"This contains a fix for an endian conversion issue in new CRC
validation in log recovery that was discovered on a ppc64 platform"
* tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix endianness error when checking log block crc on big endian platforms
Qu Wenruo [Tue, 19 Jan 2016 02:23:04 +0000 (10:23 +0800)]
btrfs: Introduce new mount option alias for nologreplay
Introduce new mount option alias "norecovery" for nologreplay, to keep
"norecovery" behavior the same with other filesystems.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 19 Jan 2016 02:23:03 +0000 (10:23 +0800)]
btrfs: Introduce new mount option to disable tree log replay
Introduce a new mount option "nologreplay" to co-operate with "ro" mount
option to get real readonly mount, like "norecovery" in ext* and xfs.
Since the new parse_options() need to check new flags at remount time,
so add a new parameter for parse_options().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 19 Jan 2016 02:23:02 +0000 (10:23 +0800)]
btrfs: Introduce new mount option usebackuproot to replace recovery
Current "recovery" mount option will only try to use backup root.
However the word "recovery" is too generic and may be confusing for some
users.
Here introduce a new and more specific mount option, "usebackuproot" to
replace "recovery" mount option.
"Recovery" will be kept for compatibility reason, but will be
deprecated.
Also, since "usebackuproot" will only affect mount behavior and after
open_ctree() it has nothing to do with the filesystem, so clear the flag
after mount succeeded.
This provides the basis for later unified "norecovery" mount option.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ dropped usebackuproot from show_mount, added note about 'recovery' to
docs ]
Signed-off-by: David Sterba <dsterba@suse.com>
Takashi Iwai [Fri, 12 Feb 2016 08:48:51 +0000 (09:48 +0100)]
Merge tag 'asoc-fix-v4.5-rc4' of git://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.5
A rather large batch of fixes here, almost all in the Intel driver.
The changes that got merged in this merge window for Skylake were rather
large and as well as issues that you'd expect in a large block of new
code there were some problems created for older processors which needed
fixing up. Things are largely settling down now hopefully.
Ryan Ware [Thu, 11 Feb 2016 23:58:44 +0000 (15:58 -0800)]
EVM: Use crypto_memneq() for digest comparisons
This patch fixes vulnerability CVE-2016-2085. The problem exists
because the vm_verify_hmac() function includes a use of memcmp().
Unfortunately, this allows timing side channel attacks; specifically
a MAC forgery complexity drop from 2^128 to 2^12. This patch changes
the memcmp() to the cryptographically safe crypto_memneq().
Reported-by: Xiaofei Rex Guo <xiaofei.rex.guo@intel.com>
Signed-off-by: Ryan Ware <ware@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Vineet Gupta [Wed, 10 Feb 2016 01:22:07 +0000 (06:52 +0530)]
ARC: mm: Introduce explicit super page size support
MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M]
So far Linux supported a single super page size for a given Normal page,
depending on the software page walking address split.
e.g. we had 11:8:13 address split for 8K page, which meant super page
was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT)
Now we turn this around, by allowing multiple Super Pages in Kconfig
(currently 2M and 16M only) and forcing page walker address split to
PGDIR_SHIFT and PAGE_SHIFT
For configs without Super page, things are same as before and
PGDIR_SHIFT can be hacked to get non default address split
The motivation for this change is a customer who needs 16M super page
and a 8K Normal page combo.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Greg Kroah-Hartman [Fri, 12 Feb 2016 04:10:58 +0000 (20:10 -0800)]
Merge tag 'phy-for-4.5-rc' of git://git./linux/kernel/git/kishon/linux-phy into usb-linus
Kishon writes:
phy: for 4.5-rc
*) Fix error handling code in phy core [phy_power_on()]
*) phy-twl4030-usb fixes for unloading the module
*) Restrict phy-hi6220-usb to HiSilicon arm64
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Andrew Morton [Fri, 12 Feb 2016 00:13:20 +0000 (16:13 -0800)]
arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
arch/x86/built-in.o: In function `uv_bios_call':
(.text+0xeba00): undefined reference to `efi_call'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Fri, 12 Feb 2016 00:13:17 +0000 (16:13 -0800)]
mm: fix pfn_t vs highmem
The pfn_t type uses an unsigned long to store a pfn + flags value. On a
64-bit platform the upper 12 bits of an unsigned long are never used for
storing the value of a pfn. However, this is not true on highmem
platforms, all 32-bits of a pfn value are used to address a 44-bit
physical address space. A pfn_t needs to store a 64-bit value.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
Fixes:
01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stuart Foster <smf.linux@ntlworld.com>
Reported-by: Julian Margetson <runaway@candw.ms>
Tested-by: Julian Margetson <runaway@candw.ms>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 12 Feb 2016 00:13:14 +0000 (16:13 -0800)]
kernel/locking/lockdep.c: convert hash tables to hlists
Mike said:
: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i. e
: kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized. Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.
Fix this ordering issue permanently: change lockdep so that it uses
hlists for the hash tables. Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for
operation immediately upon boot - lockdep_init() need not have run.
The patch will also save some memory.
lockdep_init() and lockdep_initialized can be done away with now - a 4.6
patch has been prepared to do this.
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vineet Gupta [Fri, 12 Feb 2016 00:13:11 +0000 (16:13 -0800)]
mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
[akpm@linux-foundation.org: s/threshhold/threshold/]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vineet Gupta [Fri, 12 Feb 2016 00:13:09 +0000 (16:13 -0800)]
mm,thp: khugepaged: call pte flush at the time of collapse
This showed up on ARC when running LMBench bw_mem tests as Overlapping
TLB Machine Check Exception triggered due to STLB entry (2M pages)
overlapping some NTLB entry (regular 8K page).
bw_mem 2m touches a large chunk of vaddr creating NTLB entries. In the
interim khugepaged kicks in, collapsing the contiguous ptes into a
single pmd. pmdp_collapse_flush()->flush_pmd_tlb_range() is called to
flush out NTLB entries for the ptes. This for ARC (by design) can only
shootdown STLB entries (for pmd). The stray NTLB entries cause the
overlap with the subsequent STLB entry for collapsed page. So make
pmdp_collapse_flush() call pte flush interface not pmd flush.
Note that originally all thp flush call sites in generic code called
flush_tlb_range() leaving it to architecture to implement the flush for
pte and/or pmd. Commit
12ebc1581ad11454 changed this by calling a new
opt-in API flush_pmd_tlb_range() which made the semantics more explicit
but failed to distinguish the pte vs pmd flush in generic code, which is
what this patch fixes.
Note that ARC can fixed w/o touching the generic pmdp_collapse_flush()
by defining a ARC version, but that defeats the purpose of generic
version, plus sementically this is the right thing to do.
Fixes STAR
9000961194: LMBench on AXS103 triggering duplicate TLB
exceptions with super pages
Fixes:
12ebc1581ad11454 ("mm,thp: introduce flush_pmd_tlb_range")
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 12 Feb 2016 00:13:06 +0000 (16:13 -0800)]
mm/backing-dev.c: fix error path in wb_init()
We need to use post-decrement to get percpu_counter_destroy() called on
&wb->stat[0]. Moreover, the pre-decremebt would cause infinite
out-of-bounds accesses if the setup code failed at i==0.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 12 Feb 2016 00:13:03 +0000 (16:13 -0800)]
mm, dax: check for pmd_none() after split_huge_pmd()
DAX implements split_huge_pmd() by clearing pmd. This simple approach
reduces memory overhead, as we don't need to deposit page table on huge
page mapping to make split_huge_pmd() never-fail. PTE table can be
allocated and populated later on page fault from backing store.
But one side effect is that have to check if pmd is pmd_none() after
split_huge_pmd(). In most places we do this already to deal with
parallel MADV_DONTNEED.
But I found two call sites which is not affected by MADV_DONTNEED (due
down_write(mmap_sem)), but need to have the check to work with DAX
properly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason A. Donenfeld [Fri, 12 Feb 2016 00:13:00 +0000 (16:13 -0800)]
vsprintf: kptr_restrict is okay in IRQ when 2
The kptr_restrict flag, when set to 1, only prints the kernel address
when the user has CAP_SYSLOG. When it is set to 2, the kernel address
is always printed as zero. When set to 1, this needs to check whether
or not we're in IRQ.
However, when set to 2, this check is unneccessary, and produces
confusing results in dmesg. Thus, only make sure we're not in IRQ when
mode 1 is used, but not mode 2.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 12 Feb 2016 00:12:58 +0000 (16:12 -0800)]
mm: fix filemap.c kernel doc warning
Add missing kernel-doc notation for function parameter 'gfp_mask' to fix
kernel-doc warning.
mm/filemap.c:1898: warning: No description found for parameter 'gfp_mask'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 12 Feb 2016 00:12:55 +0000 (16:12 -0800)]
ubsan: cosmetic fix to Kconfig text
When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased
significantly (~3x). So, it sounds better to have some note in Kconfig.
And, fixed a typo.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Feb 2016 19:25:55 +0000 (11:25 -0800)]
Merge tag 'gpio-v4.5-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- Probe errorpath fix for the Altera
- irqchip ofnode pointer added to the DaVinci driver
- controller instance number correction for DaVinci
* tag 'gpio-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: davinci: Fix the number of controllers allocated
gpio: davinci: Add the missing of-node pointer
gpio: gpio-altera: Remove gpiochip on probe failure.
Linus Torvalds [Thu, 11 Feb 2016 19:17:19 +0000 (11:17 -0800)]
Merge tag 'platform-drivers-x86-v4.5-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Just two small fixes for the 4.5-rc cycle:
intel_scu_ipcutil:
- underflow in scu_reg_access()
intel-hid:
- fix incorrect entries in intel_hid_keymap"
* tag 'platform-drivers-x86-v4.5-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_scu_ipcutil: underflow in scu_reg_access()
intel-hid: fix incorrect entries in intel_hid_keymap
Linus Torvalds [Thu, 11 Feb 2016 19:00:34 +0000 (11:00 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix BPF handling of branch offset adjustmnets on backjumps, from
Daniel Borkmann.
2) Make sure selinux knows about SOCK_DESTROY netlink messages, from
Lorenzo Colitti.
3) Fix openvswitch tunnel mtu regression, from David Wragg.
4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric
Dumazet.
5) Fix SCTP user hmacid byte ordering bug, from Xin Long.
6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov
Kasiviswanathan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
geneve: Relax MTU constraints
vxlan: Relax MTU constraints
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
of: of_mdio: Add marvell,
88e1145 to whitelist of PHY compatibilities.
selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
sctp: translate network order to host order when users get a hmacid
enic: increment devcmd2 result ring in case of timeout
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
net:Add sysctl_max_skb_frags
tcp: do not drop syn_recv on all icmp reports
ipv6: fix a lockdep splat
unix: correctly track in-flight fds in sending process user_struct
update be2net maintainers' email addresses
dwc_eth_qos: Reset hardware before PHY start
ipv6: addrconf: Fix recursive spin lock call
Eran Ben Elisha [Thu, 11 Feb 2016 08:24:42 +0000 (10:24 +0200)]
IB/core: Fix reading capability mask of the port info class
When checking specific attribute from a bit mask, need to use bitwise
AND and not logical AND, fixed that.
Fixes:
145d9c541032 ('IB/core: Display extended counter set if
available')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rasmus Villemoes [Tue, 9 Feb 2016 20:11:14 +0000 (21:11 +0100)]
net/mlx4: fix some error handling in mlx4_multi_func_init()
The while loop after err_slaves should use post-decrement; otherwise
we'll fail to do the kfrees for i==0, and will run into out-of-bounds
accesses if the setup above failed already at i==0.
[I'm not sure why one even bothers populating the ->vlan_filter array:
mlx4.h isn't #included by anything outside
drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter
drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the
vlan_filter elements aren't used at all.]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ulf Hansson [Thu, 11 Feb 2016 15:42:58 +0000 (16:42 +0100)]
Revert "mmc: block: don't use parameter prefix if built as module"
This reverts commit
829b6962f7e3cfc06f7c5c26269fd47ad48cf503.
Revert this change as it causes a sysfs path to change and therefore
introduces and ABI regression. More precisely Android's vold is not being
able to access /sys/module/mmcblk/parameters/perdev_minors any more, since
the path becomes changed to: "/sys/module/mmc_block/..."
Fixes:
829b6962f7e3 ("mmc: block: don't use parameter prefix if built as
module")
Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
David Sterba [Mon, 25 Jan 2016 17:44:13 +0000 (18:44 +0100)]
btrfs: teach print_leaf about temporary item subtypes
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 25 Jan 2016 17:44:13 +0000 (18:44 +0100)]
btrfs: teach print_leaf about permanent item subtypes
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 25 Jan 2016 16:51:31 +0000 (17:51 +0100)]
btrfs: switch dev stats item to the permanent item key
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 25 Jan 2016 16:32:11 +0000 (17:32 +0100)]
btrfs: introduce key type for persistent permanent items
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree.
Similar to the temporary items, we'll introduce a new name for an
existing key value and use the objectid for further extension. The
victim is the BTRFS_DEV_STATS_KEY (248).
The device stats are an example of a permanent item.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 25 Jan 2016 16:51:31 +0000 (17:51 +0100)]
btrfs: switch balance item to the temporary item key
No visible change.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 25 Jan 2016 16:32:11 +0000 (17:32 +0100)]
btrfs: introduce key type for persistent temporary items
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree. We'll introduce a new
name for an existing key value and use the objectid for further
extension. The victim is the BTRFS_BALANCE_ITEM_KEY (248).
The nature of the balance status item is a good example of the temporary
item. It exists from beginning of the balance, keeps the status until it
finishes.
Signed-off-by: David Sterba <dsterba@suse.com>
Javi Merino [Thu, 11 Feb 2016 12:00:51 +0000 (12:00 +0000)]
thermal: cpu_cooling: fix out of bounds access in time_in_idle
In __cpufreq_cooling_register() we allocate the arrays for time_in_idle
and time_in_idle_timestamp to be as big as the number of cpus in this
cpufreq device. However, in get_load() we access this array using the
cpu number as index, which can result in an out of bound access.
Index time_in_idle{,_timestamp} using the index in the cpufreq_device's
allowed_cpus mask, as we do for the load_cpu array in
cpufreq_get_requested_power()
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
David Sterba [Fri, 13 Nov 2015 12:44:28 +0000 (13:44 +0100)]
btrfs: properly set the termination value of ctx->pos in readdir
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.
There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.
We can get to that situation like that:
* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.
The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:
Overflow: e
Overflow:
7fffffff
Overflow:
7fffffffffffffff
PAX: size overflow detected in function btrfs_real_readdir
fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
context: dir_context;
CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
Call Trace:
[<
ffffffff81742f0f>] dump_stack+0x4c/0x7f
[<
ffffffff811cb706>] report_size_overflow+0x36/0x40
[<
ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
[<
ffffffff811dafc8>] iterate_dir+0xa8/0x150
[<
ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
[<
ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
Overflow: 1a
[<
ffffffff811db070>] ? iterate_dir+0x150/0x150
[<
ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83
The jump from
7fffffff to
7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.
The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.
References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
David Sterba [Thu, 11 Feb 2016 14:01:38 +0000 (15:01 +0100)]
btrfs: switch to kcalloc in btrfs_cmp_data_prepare
Kcalloc is functionally equivalent and does overflow checks.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 13:25:38 +0000 (14:25 +0100)]
btrfs: extent same: use GFP_KERNEL for page array allocations
We can safely use GFP_KERNEL in the functions called from the ioctl
handlers. Here we can allocate up to 32k so less pressure to the
allocator could help.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 13:25:38 +0000 (14:25 +0100)]
btrfs: device add and remove: use GFP_KERNEL
We can safely use GFP_KERNEL in the functions called from the ioctl
handlers.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 13:25:38 +0000 (14:25 +0100)]
btrfs: readdir: use GFP_KERNEL
Readdir is initiated from userspace and is not on the critical
writeback path, we don't need to use GFP_NOFS for allocations.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 13:25:38 +0000 (14:25 +0100)]
btrfs: fallocate: use GFP_KERNEL
Fallocate is initiated from userspace and is not on the critical
writeback path, we don't need to use GFP_NOFS for allocations.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 10:01:55 +0000 (11:01 +0100)]
btrfs: let callers of btrfs_alloc_root pass gfp flags
We don't need to use GFP_NOFS in all contexts, eg. during mount or for
dummy root tree, but we might for the the log tree creation.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 11 Feb 2016 09:49:42 +0000 (10:49 +0100)]
btrfs: scrub: use GFP_KERNEL on the submission path
Scrub is not on the critical writeback path we don't need to use
GFP_NOFS for all allocations. The failures are handled and stats passed
back to userspace.
Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the
global structures and the IO submission paths.
Functions that do the repair and fixups still use GFP_NOFS as we might
want to skip any other filesystem activity if we encounter an error.
This could turn out to be unnecessary, but requires more review compared
to the easy cases in this patch.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 18 Jan 2016 17:42:13 +0000 (18:42 +0100)]
btrfs: reada: use GFP_KERNEL everywhere
The readahead framework is not on the critical writeback path we don't
need to use GFP_NOFS for allocations. All error paths are handled and
the readahead failures are not fatal. The actual users (scrub,
dev-replace) will trigger reads if the blocks are not found in cache.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 18 Jan 2016 17:42:13 +0000 (18:42 +0100)]
btrfs: send: use GFP_KERNEL everywhere
The send operation is not on the critical writeback path we don't need
to use GFP_NOFS for allocations. All error paths are handled and the
whole operation is restartable.
Signed-off-by: David Sterba <dsterba@suse.com>