Vincent Stehlé [Tue, 10 May 2016 12:56:20 +0000 (14:56 +0200)]
Btrfs: fix fspath error deallocation
Make sure to deallocate fspath with vfree() in case of error in
init_ipath().
fspath is allocated with vmalloc() in init_data_container() since
commit
425d17a290c0 ("Btrfs: use larger limit for translation of logical to
inode").
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 27 Apr 2016 01:07:39 +0000 (03:07 +0200)]
btrfs: make find_workspace warn if there are no workspaces
Be verbose if there are no workspaces at all, ie. the module init time
preallocation failed.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 27 Apr 2016 00:41:17 +0000 (02:41 +0200)]
btrfs: make find_workspace always succeed
With just one preallocated workspace we can guarantee forward progress
even if there's no memory available for new workspaces. The cost is more
waiting but we also get rid of several error paths.
On average, there will be several idle workspaces, so the waiting
penalty won't be so bad.
In the worst case, all cpus will compete for one workspace until there's
some memory. Attempts to allocate a new one are done each time the
waiters are woken up.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 27 Apr 2016 00:55:15 +0000 (02:55 +0200)]
btrfs: preallocate compression workspaces
Preallocate one workspace for each compression type so we can guarantee
forward progress in the worst case. A failure cannot be a hard error as
we might not use compression at all on the filesystem. If we can't
allocate the workspaces later when need them, it might actually
deadlock, but in such situation the system has effectively not enough
memory to operate properly.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 27 Apr 2016 00:15:15 +0000 (02:15 +0200)]
btrfs: rename and document compression workspace members
The names are confusing, pick more fitting names and add comments.
Signed-off-by: David Sterba <dsterba@suse.com>
Adam Borowski [Sun, 8 May 2016 13:08:00 +0000 (15:08 +0200)]
btrfs: fix int32 overflow in shrink_delalloc().
UBSAN: Undefined behaviour in fs/btrfs/extent-tree.c:4623:21
signed integer overflow:
10808 * 262144 cannot be represented in type 'int [8]'
If 8192<=items<16384, we request a writeback of an insane number of pages
which is benign (everything will be written). But if items>=16384, the
space reservation won't be enough.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zygo Blaxell [Thu, 5 May 2016 04:23:49 +0000 (00:23 -0400)]
btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes
During a mount, we start the cleaner kthread first because the transaction
kthread wants to wake up the cleaner kthread. We start the transaction
kthread next because everything in btrfs wants transactions. We do reloc
recovery in the thread that was doing the original mount call once the
transaction kthread is running. This means that the cleaner kthread
could already be running when reloc recovery happens (e.g. if a snapshot
delete was started before a crash).
Relocation does not play well with the cleaner kthread, so a mutex was
added in commit
5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs: fix
race between balance recovery and root deletion" to prevent both from
being active at the same time.
If the cleaner kthread is already holding the mutex by the time we get
to btrfs_recover_relocation, the mount will be blocked until at least
one deleted subvolume is cleaned (possibly more if the mount process
doesn't get the lock right away). During this time (which could be an
arbitrarily long time on a large/slow filesystem), the mount process is
stuck and the filesystem is unnecessarily inaccessible.
Fix this by locking cleaner_mutex before we start cleaner_kthread, and
unlocking the mutex after mount no longer requires it. This ensures
that the mounting process will not be blocked by the cleaner kthread.
The cleaner kthread is already prepared for mutex contention and will
just go to sleep until the mutex is available.
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 4 May 2016 12:10:47 +0000 (14:10 +0200)]
btrfs: ioctl: reorder exclusive op check in RM_DEV
Move the op exclusivity check before the other code (same as in
ADD_DEV).
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 4 May 2016 09:32:00 +0000 (11:32 +0200)]
btrfs: add write protection to SET_FEATURES ioctl
Perform the want_write check if we get far enough to do any writes.
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Tue, 12 Apr 2016 13:36:16 +0000 (21:36 +0800)]
btrfs: fix lock dep warning move scratch super outside of chunk_mutex
Move scratch super outside of the chunk lock to avoid below
lockdep warning. The better place to scratch super is in
the function btrfs_rm_dev_replace_free_srcdev() just before
free_device, which is outside of the chunk lock as well.
To reproduce:
(fresh boot)
mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde
mount /dev/sdc /btrfs
dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
(get devmgt from https://github.com/asj/devmgt.git)
devmgt detach /dev/sde
dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
sync
btrfs replace start -Brf 3 /dev/sdf /btrfs <--
devmgt attach host7
======================================================
[ INFO: possible circular locking dependency detected ]
4.6.0-rc2asj+ #1 Not tainted
---------------------------------------------------
btrfs/2174 is trying to acquire lock:
(sb_writers){.+.+.+}, at:
[<
ffffffff812449b4>] __sb_start_write+0xb4/0xf0
but task is already holding lock:
(&fs_info->chunk_mutex){+.+.+.}, at:
[<
ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs]
which lock already depends on the new lock.
Chain exists of:
sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->chunk_mutex);
lock(&fs_devs->device_list_mutex);
lock(&fs_info->chunk_mutex);
lock(sb_writers);
*** DEADLOCK ***
-> #0 (sb_writers){.+.+.+}:
[<
ffffffff810e6415>] __lock_acquire+0x1bc5/0x1ee0
[<
ffffffff810e707e>] lock_acquire+0xbe/0x210
[<
ffffffff810df49a>] percpu_down_read+0x4a/0xa0
[<
ffffffff812449b4>] __sb_start_write+0xb4/0xf0
[<
ffffffff81265534>] mnt_want_write+0x24/0x50
[<
ffffffff812508a2>] path_openat+0x952/0x1190
[<
ffffffff81252451>] do_filp_open+0x91/0x100
[<
ffffffff8123f5cc>] file_open_name+0xfc/0x140
[<
ffffffff8123f643>] filp_open+0x33/0x60
[<
ffffffffa0572bb6>] update_dev_time+0x16/0x40 [btrfs]
[<
ffffffffa057f60d>] btrfs_scratch_superblocks+0x5d/0xb0 [btrfs]
[<
ffffffffa057f70e>] btrfs_rm_dev_replace_remove_srcdev+0xae/0xd0 [btrfs]
[<
ffffffffa05c62c5>] btrfs_dev_replace_finishing+0x4b5/0x980 [btrfs]
[<
ffffffffa05c6ae8>] btrfs_dev_replace_start+0x358/0x530 [btrfs]
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Ashish Samant [Sat, 30 Apr 2016 01:33:59 +0000 (18:33 -0700)]
btrfs: Fix BUG_ON condition in scrub_setup_recheck_block()
pagev array in scrub_block{} is of size SCRUB_MAX_PAGES_PER_BLOCK.
page_index should be checked with the same to trigger BUG_ON().
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 12 Apr 2016 16:54:40 +0000 (12:54 -0400)]
Btrfs: remove BUG_ON()'s in btrfs_map_block
btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree
to not be assholes and panic at any possible chance things are all fucked up.
Signed-off-by: Josef Bacik <jbacik@fb.com>
[ removed type casts ]
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Wed, 27 Apr 2016 00:53:31 +0000 (17:53 -0700)]
Btrfs: fix divide error upon chunk's stripe_len
The struct 'map_lookup' uses type int for @stripe_len, while
btrfs_chunk_stripe_len() can return a u64 value, and it may end up with
@stripe_len being undefined value and it can lead to 'divide error' in
__btrfs_map_block().
This changes 'map_lookup' to use type u64 for stripe_len, also right now
we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for
BTRFS_STRIPE_LEN.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ folded division fix to scrub_raid56_parity ]
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 26 Apr 2016 14:22:06 +0000 (16:22 +0200)]
btrfs: sysfs: protect reading label by lock
If the label setting ioctl races with sysfs label handler, we could get
mixed result in the output, part old part new. We should either get the
old or new label. The chances to hit this race are low.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 26 Apr 2016 14:03:57 +0000 (16:03 +0200)]
btrfs: add check to sysfs handler of label
Add a sanity check for the fs_info as we will dereference it, similar to
what the 'store features' handler does.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jan 2015 17:43:31 +0000 (18:43 +0100)]
btrfs: add read-only check to sysfs handler of features
We don't want to trigger the change on a read-only filesystem, similar
to what the label handler does.
Signed-off-by: David Sterba <dsterba@suse.cz>
David Sterba [Thu, 24 Mar 2016 17:00:53 +0000 (18:00 +0100)]
btrfs: reuse existing variable in scrub_stripe, reduce stack usage
The key variable occupies 17 bytes, the key_start is used once, we can
simply reuse existing 'key' for that purpose. As the key is not a simple
type, compiler doest not do it on itself.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 24 Mar 2016 16:49:22 +0000 (17:49 +0100)]
btrfs: use dynamic allocation for root item in create_subvol
The size of root item is more than 400 bytes, which is quite a lot of
stack space. As we do IO from inside the subvolume ioctls, we should
keep the stack usage low in case the filesystem is on top of other
layers (NFS, device mapper, iscsi, etc).
Reviewed-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:40:08 +0000 (18:40 +0200)]
btrfs: clone: use vmalloc only as fallback for nodesize bufer
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:40:08 +0000 (18:40 +0200)]
btrfs: send: use vmalloc only as fallback for clone_sources_tmp
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:40:08 +0000 (18:40 +0200)]
btrfs: send: use vmalloc only as fallback for clone_roots
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:52:02 +0000 (18:52 +0200)]
btrfs: send: use temporary variable to store allocation size
We're going to use the argument multiple times later.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:40:08 +0000 (18:40 +0200)]
btrfs: send: use vmalloc only as fallback for read_buf
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 11 Apr 2016 16:40:08 +0000 (18:40 +0200)]
btrfs: send: use vmalloc only as fallback for send_buf
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 18 Apr 2016 08:51:23 +0000 (16:51 +0800)]
btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex
When the replace target fails, the target device will be taken
out of fs device list, scratch + update_dev_time and freed. However
we could do the scratch + update_dev_time and free part after the
device has been taken out of device list, so that we don't have to
hold the device_list_mutex and uuid_mutex locks.
Reported issue:
[ 5375.718845] ======================================================
[ 5375.718846] [ INFO: possible circular locking dependency detected ]
[ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted
[ 5375.718849] -------------------------------------------------------
[ 5375.718851] btrfs-health/4662 is trying to acquire lock:
[ 5375.718861] (sb_writers){.+.+.+}, at: [<
ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.718862]
[ 5375.718862] but task is already holding lock:
[ 5375.718907] (&fs_devs->device_list_mutex){+.+.+.}, at: [<
ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
[ 5375.718907]
[ 5375.718907] which lock already depends on the new lock.
[ 5375.718907]
[ 5375.718908]
[ 5375.718908] the existing dependency chain (in reverse order) is:
[ 5375.718911]
[ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}:
[ 5375.718917] [<
ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718921] [<
ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
[ 5375.718940] [<
ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs]
[ 5375.718945] [<
ffffffff81267079>] show_vfsmnt+0x49/0x150
[ 5375.718948] [<
ffffffff81240b07>] m_show+0x17/0x20
[ 5375.718951] [<
ffffffff81246868>] seq_read+0x2d8/0x3b0
[ 5375.718955] [<
ffffffff8121df28>] __vfs_read+0x28/0xd0
[ 5375.718959] [<
ffffffff8121e806>] vfs_read+0x86/0x130
[ 5375.718962] [<
ffffffff8121f4c9>] SyS_read+0x49/0xa0
[ 5375.718966] [<
ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.718968]
[ 5375.718968] -> #2 (namespace_sem){+++++.}:
[ 5375.718971] [<
ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718974] [<
ffffffff81635199>] down_write+0x49/0x80
[ 5375.718977] [<
ffffffff81243593>] lock_mount+0x43/0x1c0
[ 5375.718979] [<
ffffffff81243c13>] do_add_mount+0x23/0xd0
[ 5375.718982] [<
ffffffff81244afb>] do_mount+0x27b/0xe30
[ 5375.718985] [<
ffffffff812459dc>] SyS_mount+0x8c/0xd0
[ 5375.718988] [<
ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.718991]
[ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
[ 5375.718994] [<
ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718996] [<
ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
[ 5375.719001] [<
ffffffff8122d608>] path_openat+0x468/0x1360
[ 5375.719004] [<
ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719007] [<
ffffffff8121da7b>] do_sys_open+0x12b/0x210
[ 5375.719010] [<
ffffffff8121db7e>] SyS_open+0x1e/0x20
[ 5375.719013] [<
ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.719015]
[ 5375.719015] -> #0 (sb_writers){.+.+.+}:
[ 5375.719018] [<
ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
[ 5375.719021] [<
ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.719026] [<
ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
[ 5375.719028] [<
ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.719031] [<
ffffffff81242eb4>] mnt_want_write+0x24/0x50
[ 5375.719035] [<
ffffffff8122ded2>] path_openat+0xd32/0x1360
[ 5375.719037] [<
ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719040] [<
ffffffff8121d8a4>] file_open_name+0xe4/0x130
[ 5375.719043] [<
ffffffff8121d923>] filp_open+0x33/0x60
[ 5375.719073] [<
ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
[ 5375.719099] [<
ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
[ 5375.719123] [<
ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
[ 5375.719150] [<
ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
[ 5375.719175] [<
ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
[ 5375.719199] [<
ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
[ 5375.719222] [<
ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
[ 5375.719225] [<
ffffffff810a70df>] kthread+0xef/0x110
[ 5375.719229] [<
ffffffff81637d2f>] ret_from_fork+0x3f/0x70
[ 5375.719230]
[ 5375.719230] other info that might help us debug this:
[ 5375.719230]
[ 5375.719233] Chain exists of:
[ 5375.719233] sb_writers --> namespace_sem --> &fs_devs->device_list_mutex
[ 5375.719233]
[ 5375.719234] Possible unsafe locking scenario:
[ 5375.719234]
[ 5375.719234] CPU0 CPU1
[ 5375.719235] ---- ----
[ 5375.719236] lock(&fs_devs->device_list_mutex);
[ 5375.719238] lock(namespace_sem);
[ 5375.719239] lock(&fs_devs->device_list_mutex);
[ 5375.719241] lock(sb_writers);
[ 5375.719241]
[ 5375.719241] *** DEADLOCK ***
[ 5375.719241]
[ 5375.719243] 4 locks held by btrfs-health/4662:
[ 5375.719266] #0: (&fs_info->health_mutex){+.+.+.}, at: [<
ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs]
[ 5375.719293] #1: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<
ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs]
[ 5375.719319] #2: (uuid_mutex){+.+.+.}, at: [<
ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs]
[ 5375.719343] #3: (&fs_devs->device_list_mutex){+.+.+.}, at: [<
ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
[ 5375.719343]
[ 5375.719343] stack backtrace:
[ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40
[ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015
[ 5375.719352]
0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0
[ 5375.719354]
ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930
[ 5375.719357]
ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004
[ 5375.719357] Call Trace:
[ 5375.719363] [<
ffffffff813529e3>] dump_stack+0x85/0xc2
[ 5375.719366] [<
ffffffff810d667c>] print_circular_bug+0x1ec/0x260
[ 5375.719369] [<
ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
[ 5375.719373] [<
ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 5375.719376] [<
ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.719378] [<
ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
[ 5375.719383] [<
ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
[ 5375.719385] [<
ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
[ 5375.719387] [<
ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.719389] [<
ffffffff81242eb4>] mnt_want_write+0x24/0x50
[ 5375.719393] [<
ffffffff8122ded2>] path_openat+0xd32/0x1360
[ 5375.719415] [<
ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
[ 5375.719418] [<
ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 5375.719420] [<
ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719423] [<
ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80
[ 5375.719426] [<
ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0
[ 5375.719430] [<
ffffffff8122e7d4>] ? getname_kernel+0x34/0x120
[ 5375.719433] [<
ffffffff8121d8a4>] file_open_name+0xe4/0x130
[ 5375.719436] [<
ffffffff8121d923>] filp_open+0x33/0x60
[ 5375.719462] [<
ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
[ 5375.719485] [<
ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
[ 5375.719506] [<
ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
[ 5375.719530] [<
ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
[ 5375.719554] [<
ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs]
[ 5375.719576] [<
ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
[ 5375.719598] [<
ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
[ 5375.719621] [<
ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
[ 5375.719641] [<
ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs]
[ 5375.719661] [<
ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
[ 5375.719663] [<
ffffffff810a70df>] kthread+0xef/0x110
[ 5375.719666] [<
ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
[ 5375.719669] [<
ffffffff81637d2f>] ret_from_fork+0x3f/0x70
[ 5375.719672] [<
ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
[ 5375.719697] ------------[ cut here ]------------
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Dan Carpenter [Wed, 13 Apr 2016 06:40:59 +0000 (09:40 +0300)]
btrfs: send: silence an integer overflow warning
The "sizeof(*arg->clone_sources) * arg->clone_sources_count" expression
can overflow. It causes several static checker warnings. It's all
under CAP_SYS_ADMIN so it's not that serious but lets silence the
warnings.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Luis de Bethencourt [Wed, 30 Mar 2016 22:18:14 +0000 (23:18 +0100)]
btrfs: avoid overflowing f_bfree
Since mixed block groups accounting isn't byte-accurate and f_bree is an
unsigned integer, it could overflow. Avoid this.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Suggested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Luis de Bethencourt [Wed, 30 Mar 2016 20:53:38 +0000 (21:53 +0100)]
btrfs: fix mixed block count of available space
Metadata for mixed block is already accounted in total data and should not
be counted as part of the free metadata space.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=114281
Signed-off-by: David Sterba <dsterba@suse.com>
Austin S. Hemmelgarn [Wed, 23 Mar 2016 18:22:59 +0000 (14:22 -0400)]
btrfs: allow balancing to dup with multi-device
Currently, we don't allow the user to try and rebalance to a dup profile
on a multi-device filesystem. In most cases, this is a perfectly sensible
restriction as raid1 uses the same amount of space and provides better
protection.
However, when reshaping a multi-device filesystem down to a single device
filesystem, this requires the user to convert metadata and system chunks
to single profile before deleting devices, and then convert again to dup,
which leaves a period of time where metadata integrity is reduced.
This patch removes the single-device-only restriction from converting to
dup profile to remove this potential data integrity reduction.
Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 15 Dec 2015 02:29:32 +0000 (18:29 -0800)]
Btrfs: do not create empty block group if we have allocated data
Now we force to create empty block group to keep data profile alive,
however, in the below example, we eventually get an empty block group
while we're trying to get more space for other types (metadata/system),
- Before,
block group "A": size=2G, used=1.2G
block group "B": size=2G, used=512M
- After "btrfs balance start -dusage=50 mount_point",
block group "A": size=2G, used=(1.2+0.5)G
block group "C": size=2G, used=0
Since there is no data in block group C, it won't be deleted
automatically and we have to get the unused 2G until the next mount.
Balance itself just moves data and doesn't remove data, so it's safe
to not create such a empty block group if we already have data
allocated in other block groups.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chandan Rajendra [Sun, 3 Apr 2016 21:23:06 +0000 (02:53 +0530)]
Btrfs: __btrfs_buffered_write: Pass valid file offset when releasing delalloc space
The delalloc reserved space is calculated in terms of number of bytes
used by an integral number of blocks. This is done by rounding down the
value of 'pos' to the nearest multiple of sectorsize.
The file offset value held by 'pos' variable may not be aligned to
sectorsize and hence when passing it as an argument to
btrfs_delalloc_release_space(), we may end up releasing larger delalloc
space than we originally had reserved.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 8 Mar 2016 00:56:22 +0000 (16:56 -0800)]
Btrfs: cleanup error handling in extent_write_cached_pages
Now that we bail out immediately if ->writepage() returns an error,
we don't need an extra error to retain the error code.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 8 Mar 2016 00:56:21 +0000 (16:56 -0800)]
Btrfs: make mapping->writeback_index point to the last written page
If sequential writer is writing in the middle of the page and it just redirties
the last written page by continuing from it.
In the above case this can end up with seeking back to that firstly redirtied
page after writing all the pages at the end of file because btrfs updates
mapping->writeback_index to 1 past the current one.
For non-cow filesystems, the cost is only about extra seek, while for cow
filesystems such as btrfs, it means unnecessary fragments.
To avoid it, we just need to continue writeback from the last written page.
This also updates btrfs to behave like what write_cache_pages() does, ie, bail
out immediately if there is an error in writepage().
<Ref: https://www.spinics.net/lists/linux-btrfs/msg52628.html>
Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Luke Dashjr [Thu, 29 Oct 2015 08:22:21 +0000 (08:22 +0000)]
btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.
Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 24 Apr 2016 23:17:05 +0000 (16:17 -0700)]
Linux 4.6-rc5
Linus Torvalds [Sun, 24 Apr 2016 00:15:39 +0000 (17:15 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"Specifics in this pull request:
- Fixes in mediatek and OF thermal drivers
- Fixes in power_allocator governor
- More fixes of unsigned to int type change in thermal_core.c.
These change have been CI tested using KernelCI bot. \o/"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: fix Mediatek thermal controller build
thermal: consistently use int for trip temp
thermal: fix mtk_thermal build dependency
thermal: minor mtk_thermal.c cleanups
thermal: power_allocator: req_range multiplication should be a 64 bit type
thermal: of: add __init attribute
Linus Torvalds [Sat, 23 Apr 2016 21:53:11 +0000 (14:53 -0700)]
Merge tag 'asm-generic-4.6' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic update from Arnd Bergmann:
"Here is one patch to wire up the preadv/pwritev system calls in the
generic system call table, which is required for all architectures
that were merged in the last few years, including arm64.
Usually these get merged along with the syscall implementation or one
of the architecture trees, but this time that did not happen.
Andre and Christoph both sent a version of this patch, I picked the
one I got first"
* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
generic syscalls: wire up preadv2 and pwritev2 syscalls
Andre Przywara [Mon, 11 Apr 2016 09:17:46 +0000 (10:17 +0100)]
generic syscalls: wire up preadv2 and pwritev2 syscalls
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Sat, 23 Apr 2016 19:07:29 +0000 (12:07 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: two EDAC driver fixes, a Xen crash fix, a HyperV log spam
fix and a documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86 EDAC, sb_edac.c: Take account of channel hashing when needed
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
x86/doc: Correct limits in Documentation/x86/x86_64/mm.txt
x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
Linus Torvalds [Sat, 23 Apr 2016 18:45:52 +0000 (11:45 -0700)]
Merge branches 'perf-urgent-for-linus', 'smp-urgent-for-linus' and 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf, cpu hotplug and timer fixes from Ingo Molnar:
"perf:
- A single tooling fix for a user-triggerable segfault.
CPU hotplug:
- Fix a CPU hotplug corner case regression, introduced by the recent
hotplug rework
timers:
- Fix a boot hang in the ARM based Tango SoC clocksource driver"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf intel-pt: Fix segfault tracing transactions
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix rollback during error-out in __cpu_disable()
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test
Linus Torvalds [Sat, 23 Apr 2016 18:39:48 +0000 (11:39 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Misc fixes:
pvqspinlocks:
- an instrumentation fix
futexes:
- preempt-count vs pagefault_disable decouple corner case fix
- futex requeue plist race window fix
- futex UNLOCK_PI transaction fix for a corner case"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
locking/pvqspinlock: Fix division by zero in qstat_read()
Linus Torvalds [Sat, 23 Apr 2016 18:34:39 +0000 (11:34 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"A core irq affinity masks related fix and a MIPS irqchip driver fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Don't overrun pcpu_masks array
genirq: Dont allow affinity mask to be updated on IPIs
Linus Torvalds [Sat, 23 Apr 2016 18:25:01 +0000 (11:25 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"A handful of objtool fixes: two improvements to how warnings are
printed plus a false positive warning fix, and build environment fix"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix Makefile to properly see if libelf is supported
objtool: Detect falling through to the next function
objtool: Add workaround for GCC switch jump table bug
Linus Torvalds [Sat, 23 Apr 2016 18:20:03 +0000 (11:20 -0700)]
Merge tag 'usb-4.6-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB / PHY driver fixes from Greg KH:
"Here are two small sets of patches, both from subsystem trees, USB
gadget and PHY drivers.
Full details are in the shortlog, and they have all been in linux-next
for a while (before I merged them to the USB tree)"
* tag 'usb-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: Fix use-after-free
usb: dwc3: gadget: Fix suspend/resume during device mode
usb: dwc3: fix memory leak of dwc->regset
usb: dwc3: core: fix PHY handling during suspend
usb: dwc3: omap: fix up error path on probe()
usb: gadget: composite: Clear reserved fields of SSP Dev Cap
phy: rockchip-emmc: adapt binding to specifiy register offset and length
phy: rockchip-emmc: should be a child device of the GRF
phy: rockchip-dp: should be a child device of the GRF
Linus Torvalds [Sat, 23 Apr 2016 18:13:46 +0000 (11:13 -0700)]
Merge tag 'tty-4.6-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull serial fixes from Greg KH:
"Here are 3 serial driver fixes for issues that have been reported.
Two are reverts, fixing problems that were in the big TTY/Serial
driver merge in 4.6-rc1, and the last one is a simple bugfix for a
regression that showed up in 4.6-rc1 as well.
All have been in linux-next with no reported issues"
* tag 'tty-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: 8250: Add hardware dependency to RT288X option"
tty/serial/8250: fix RS485 half-duplex RX
Revert "serial-uartlite: Constify uartlite_be/uartlite_le"
Linus Torvalds [Sat, 23 Apr 2016 18:04:26 +0000 (11:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Just minor driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: twl4030-vibra - do not reparent to grandparent
Input: twl6040-vibra - do not reparent to grandparent
Input: twl6040-vibra - ignore return value of schedule_work
Input: twl6040-vibra - fix NULL pointer dereference by removing workqueue
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: arizona-haptic - don't assign input_dev parent
Input: clarify we want BTN_TOOL_<name> on proximity
Input: xpad - add Mad Catz FightStick TE 2 VID/PID
Input: gtco - fix crash on detecting device without endpoints
Linus Torvalds [Fri, 22 Apr 2016 18:52:49 +0000 (11:52 -0700)]
Merge tag 'pinctrl-v4.6-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some pin control driver fixes came in. One headed for stable and the
other two are just ordinary merge window fixes.
- Make the i.MX driver select REGMAP as a dependency
- Fix up the Mediatek debounce time unit
- Fix a real hairy ffs vs __ffs issue in the Single pinctrl driver"
* tag 'pinctrl-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
pinctrl: imx: Kconfig: PINCTRL_IMX select REGMAP
Linus Torvalds [Fri, 22 Apr 2016 18:11:15 +0000 (11:11 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Cache invalidation fix for early CPU boot status update (incorrect
cacheline)
- of_put_node() missing in the spin_table code
- EL1/El2 early init inconsistency when Virtualisation Host Extensions
are present
- RCU warning fix in the arm_pmu.c driver
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix EL1/EL2 early init inconsistencies with VHE
drivers/perf: arm-pmu: fix RCU usage on pmu resume from low-power
arm64: spin-table: add missing of_node_put()
arm64: fix invalidation of wrong __early_cpu_boot_status cacheline
Linus Torvalds [Fri, 22 Apr 2016 17:53:12 +0000 (10:53 -0700)]
Merge tag 'powerpc-4.6-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three powerpc cpu feature fixes from Anton Blanchard:
- scan_features() updated incorrect bits for REAL_LE
- update cpu_user_features2 in scan_features()
- update TM user feature bits in scan_features()"
* tag 'powerpc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
Linus Torvalds [Fri, 22 Apr 2016 17:41:31 +0000 (10:41 -0700)]
Merge tag 'iommu-fixes-v4.6-rc4' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"The fixes include:
- Two patches to revert the use of default domains in the ARM SMMU
driver. Enabling this caused regressions which need more thorough
fixing. So the regressions are fixed for now by disabling the use
of default domains.
- A fix for a v4.4 regression in the AMD IOMMU driver which broke
devices behind invisible PCIe-to-PCI bridges with IOMMU enabled"
* tag 'iommu-fixes-v4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: Don't allocate resources for bypass domains
iommu/arm-smmu: Fix stream-match conflict with IOMMU_DOMAIN_DMA
iommu/amd: Fix checking of pci dma aliases
Linus Torvalds [Fri, 22 Apr 2016 17:29:52 +0000 (10:29 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"i915, nouveau and amdgpu/radeon fixes in this:
nouveau:
Two fixes, one for a regression with dithering and one for a bug
hit by the userspace drivers.
i915:
A few fixes, mostly things heading for stable, two important
skylake GT3/4 hangs.
radeon/amdgpu:
Some audio, suspend/resume and some runtime PM fixes, along with
two patches to harden the userptr ABI a bit"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
drm: Loongson-3 doesn't fully support wc memory
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: forbid mapping of userptr bo through radeon device file
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/kms: fix setting of default values for dithering properties
drm/radeon: print a message if ATPX dGPU power control is missing
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/amdgpu/acp: fix resume on CZ systems with AZ audio
drm/radeon: add a quirk for a XFX R9 270X
drm/radeon: print pci revision as well as pci ids on driver load
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Force ringbuffers to not be at offset 0
drm/i915: Adjust size of PIPE_CONTROL used for gen8 render seqno write
drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs
drm/i915/skl: Fix rc6 based gpu/system hang
drm/i915/userptr: Hold mmref whilst calling get-user-pages
...
Linus Torvalds [Fri, 22 Apr 2016 17:17:18 +0000 (10:17 -0700)]
Merge tag 'sound-4.6-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again a relatively calm week without surprise: most of fixes are about
HD-audio, including fixes for Cirrus codec regression and a race over
regmap access. Although both change are slightly unintuitive, the
risk of further breakage is quite low, I hope.
Other than that, all the rest are trivial"
* tag 'sound-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix possible race on regmap bypass flip
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA - hda: hdmi check NULL pointer in hdmi_set_chmap
ALSA: hda - Don't trust the reported actual power state
Greg Kroah-Hartman [Fri, 22 Apr 2016 08:13:24 +0000 (17:13 +0900)]
Merge tag 'phy-for-4.6-rc' of git://git./linux/kernel/git/kishon/linux-phy into usb-linus
Kishon writes:
phy: for 4.6-rc
*) make rockchip-dp and rockchip-emmc PHY child device of
GRF
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Tony Luck [Thu, 14 Apr 2016 17:22:02 +0000 (10:22 -0700)]
x86 EDAC, sb_edac.c: Take account of channel hashing when needed
Haswell and Broadwell can be configured to hash the channel
interleave function using bits [27:12] of the physical address.
On those processor models we must check to see if hashing is
enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and
act accordingly.
Based on a patch by patrickg <patrickg@supermicro.com>
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tony Luck [Thu, 14 Apr 2016 17:21:52 +0000 (10:21 -0700)]
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
In commit:
eb1af3b71f9d ("Fix computation of channel address")
I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes:
eb1af3b71f9d ("Fix computation of channel address")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Greg Kroah-Hartman [Fri, 22 Apr 2016 08:09:05 +0000 (17:09 +0900)]
Merge tag 'fixes-for-v4.6-rc5' of git://git./linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.6-rc5
No more major fixes left. Out of the 6 fixes we have
here, 4 are on dwc3.
The most important is the memory leak fix in
dwc3/debugfs.c. We also have a fix for PHY handling
in suspend/resume and a fix for dwc3-omap's error
handling.
Suspend/resume also had the potential to trigger a
NULL pointer dereference on dwc3; that's also fixed
now.
Our good ol' ffs function gets a use-after-free fix
while the generic composite.c layer has a robustness
fix by making sure reserved fields of a possible SSP
device capability descriptor is cleared to 0.
Jan Beulich [Thu, 21 Apr 2016 06:27:04 +0000 (00:27 -0600)]
x86/mm/xen: Suppress hugetlbfs in PV guests
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
kernel BUG at .../fs/hugetlbfs/inode.c:428!
invalid opcode: 0000 [#1] SMP
...
RIP: e030:[<
ffffffff811c333b>] [<
ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
...
Call Trace:
[<
ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
[<
ffffffff81167b3d>] evict+0xbd/0x1b0
[<
ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
[<
ffffffff81165b0e>] dput+0x1fe/0x220
[<
ffffffff81150535>] __fput+0x155/0x200
[<
ffffffff81079fc0>] task_work_run+0x60/0xa0
[<
ffffffff81063510>] do_exit+0x160/0x400
[<
ffffffff810637eb>] do_group_exit+0x3b/0xa0
[<
ffffffff8106e8bd>] get_signal+0x1ed/0x470
[<
ffffffff8100f854>] do_signal+0x14/0x110
[<
ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
[<
ffffffff814178a5>] retint_user+0x8/0x13
This is CVE-2016-3961 / XSA-174.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juergen Gross [Fri, 22 Apr 2016 07:35:04 +0000 (09:35 +0200)]
x86/doc: Correct limits in Documentation/x86/x86_64/mm.txt
Correct the size of the module mapping space and the maximum available
physical memory size of current processors.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1461310504-15977-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sebastian Andrzej Siewior [Fri, 8 Apr 2016 12:40:15 +0000 (14:40 +0200)]
cpu/hotplug: Fix rollback during error-out in __cpu_disable()
The recent introduction of the hotplug thread which invokes the callbacks on
the plugged cpu, cased the following regression:
If takedown_cpu() fails, then we run into several issues:
1) The rollback of the target cpu states is not invoked. That leaves the smp
threads and the hotplug thread in disabled state.
2) notify_online() is executed due to a missing skip_onerr flag. That causes
that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which
confuses quite some notifiers.
3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's
not an issue per se, but it is inconsistent and in consequence blocks the
patches which rely on these states being invoked on the target CPU and not
on the controlling cpu. It also does not preserve the strict call order on
rollback which is problematic for the ongoing state machine conversion as
well.
To fix this we add a rollback flag to the remote callback machinery and invoke
the rollback including the CPU_DOWN_FAILED notification on the remote
cpu. Further mark the notify online state with 'skip_onerr' so we don't get a
double invokation.
This workaround will go away once we moved the unplug invocation to the target
cpu itself.
[ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the
target cpu ]
Fixes:
4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Daniel Lezcano [Tue, 19 Apr 2016 13:43:02 +0000 (15:43 +0200)]
clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test
Commit
0881841f7e78 introduced a regression by inverting a test check
after calling clocksource_mmio_init(). That results on the system to
hang at boot time.
Fix it by inverting the test again.
Fixes:
0881841f7e78 ("Replace code by clocksource_mmio_init")
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Steven Rostedt [Wed, 20 Apr 2016 15:32:35 +0000 (11:32 -0400)]
objtool: Fix Makefile to properly see if libelf is supported
When doing a make allmodconfig, I hit the following compile error:
In file included from builtin-check.c:32:0:
elf.h:22:18: fatal error: gelf.h: No such file or directory
compilation terminated.
...
Digging into it, it appears that the $(shell ..) command in the Makefile does
not give the proper result when it fails to find -lelf, and continues to
compile objtool.
Instead, use the "try-run" makefile macro to perform the test. This gives a
proper result for both cases.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Fixes:
442f04c34a1a4 ("objtool: Add tool to perform compile-time stack metadata validation")
Link: http://lkml.kernel.org/r/20160420153234.GA24032@home.goodmis.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 22 Apr 2016 00:39:26 +0000 (10:39 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Misc radeon and amdgpu bug fixes for 4.6.
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: forbid mapping of userptr bo through radeon device file
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/radeon: print a message if ATPX dGPU power control is missing
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/amdgpu/acp: fix resume on CZ systems with AZ audio
drm/radeon: add a quirk for a XFX R9 270X
drm/radeon: print pci revision as well as pci ids on driver load
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
drm/radeon: fix initial connector audio value
Huacai Chen [Tue, 19 Apr 2016 11:19:11 +0000 (19:19 +0800)]
drm: Loongson-3 doesn't fully support wc memory
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 22 Apr 2016 00:09:33 +0000 (10:09 +1000)]
Merge branch 'linux-4.6' of git://github.com/skeggsb/linux into drm-fixes
transform feedback fix.
* 'linux-4.6' of git://github.com/skeggsb/linux:
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
Ben Skeggs [Fri, 22 Apr 2016 00:05:21 +0000 (10:05 +1000)]
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Sonny Jiang [Mon, 18 Apr 2016 20:05:04 +0000 (16:05 -0400)]
amdgpu/uvd: add uvd fw version for amdgpu
Was previously always hardcoded to 0.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Jérôme Glisse [Tue, 19 Apr 2016 13:07:51 +0000 (09:07 -0400)]
drm/amdgpu: forbid mapping of userptr bo through radeon device file
Allowing userptr bo which are basicly a list of page from some vma
(so either anonymous page or file backed page) would lead to serious
corruption of kernel structures and counters (because we overwrite
the page->mapping field when mapping buffer).
This will already block if the buffer was populated before anyone does
try to mmap it because then TTM_PAGE_FLAG_SG would be set in in the
ttm_tt flags. But that flag is check before ttm_tt_populate in the ttm
vm fault handler.
So to be safe just add a check to verify_access() callback.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jérôme Glisse [Tue, 19 Apr 2016 13:07:50 +0000 (09:07 -0400)]
drm/radeon: forbid mapping of userptr bo through radeon device file
Allowing userptr bo which are basicly a list of page from some vma
(so either anonymous page or file backed page) would lead to serious
corruption of kernel structures and counters (because we overwrite
the page->mapping field when mapping buffer).
This will already block if the buffer was populated before anyone does
try to mmap it because then TTM_PAGE_FLAG_SG would be set in in the
ttm_tt flags. But that flag is check before ttm_tt_populate in the ttm
vm fault handler.
So to be safe just add a check to verify_access() callback.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 18 Apr 2016 22:25:34 +0000 (18:25 -0400)]
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
Fixes array overflow on these chips.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Mon, 18 Apr 2016 22:09:57 +0000 (18:09 -0400)]
drm/amdgpu: use defines for CRTCs and AMFT blocks
Prerequiste for the next patch which ups the limits.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
cpaul@redhat.com [Mon, 4 Apr 2016 23:58:47 +0000 (19:58 -0400)]
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
With the joys of things running concurrently, there's always a chance
that the port we get passed in drm_dp_payload_send_msg() isn't actually
valid anymore. Because of this, we need to make sure we validate the
reference to the port before we use it otherwise we risk running into
various race conditions. For instance, on the Dell MST monitor I have
here for testing, hotplugging it enough times causes us to kernel panic:
[drm:intel_mst_enable_dp] 1
[drm:drm_dp_update_payload_part2] payload 0 1
[drm:intel_get_hpd_pins] hotplug event received, stat 0x00200000, dig 0x10101011, pins 0x00000020
[drm:intel_hpd_irq_handler] digital hpd port B - short
[drm:intel_dp_hpd_pulse] got hpd irq on port B - short
[drm:intel_dp_check_mst_status] got esi 00 10 00
[drm:drm_dp_update_payload_part2] payload 1 1
general protection fault: 0000 [#1] SMP
…
Call Trace:
[<
ffffffffa012b632>] drm_dp_update_payload_part2+0xc2/0x130 [drm_kms_helper]
[<
ffffffffa032ef08>] intel_mst_enable_dp+0xf8/0x180 [i915]
[<
ffffffffa0310dbd>] haswell_crtc_enable+0x3ed/0x8c0 [i915]
[<
ffffffffa030c84d>] intel_atomic_commit+0x5ad/0x1590 [i915]
[<
ffffffffa01db877>] ? drm_atomic_set_crtc_for_connector+0x57/0xe0 [drm]
[<
ffffffffa01dc4e7>] drm_atomic_commit+0x37/0x60 [drm]
[<
ffffffffa0130a3a>] drm_atomic_helper_set_config+0x7a/0xb0 [drm_kms_helper]
[<
ffffffffa01cc482>] drm_mode_set_config_internal+0x62/0x100 [drm]
[<
ffffffffa01d02ad>] drm_mode_setcrtc+0x3cd/0x4e0 [drm]
[<
ffffffffa01c18e3>] drm_ioctl+0x143/0x510 [drm]
[<
ffffffffa01cfee0>] ? drm_mode_setplane+0x1b0/0x1b0 [drm]
[<
ffffffff810f79a7>] ? hrtimer_start_range_ns+0x1b7/0x3a0
[<
ffffffff81212962>] do_vfs_ioctl+0x92/0x570
[<
ffffffff81590852>] ? __sys_recvmsg+0x42/0x80
[<
ffffffff81212eb9>] SyS_ioctl+0x79/0x90
[<
ffffffff816b4e32>] entry_SYSCALL_64_fastpath+0x1a/0xa4
RIP [<
ffffffffa012b026>] drm_dp_payload_send_msg+0x146/0x1f0 [drm_kms_helper]
Which occurs because of the hotplug event shown in the log, which ends
up causing DRM's dp helpers to drop the port we're updating the payload
on and panic.
CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: David Airlie <airlied@linux.ie>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 21 Apr 2016 23:53:07 +0000 (09:53 +1000)]
Merge branch 'linux-4.6' of git://github.com/skeggsb/linux into drm-fixes
Single nouveau regression fix
* 'linux-4.6' of git://github.com/skeggsb/linux:
drm/nouveau/kms: fix setting of default values for dithering properties
Ben Skeggs [Thu, 21 Apr 2016 23:09:41 +0000 (09:09 +1000)]
drm/nouveau/kms: fix setting of default values for dithering properties
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Dave Airlie [Thu, 21 Apr 2016 23:09:11 +0000 (09:09 +1000)]
Merge tag 'drm-intel-fixes-2016-04-21' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Hi Dave, fixes all around, all but one are cc: stable material, the most
important ones are likely the Skylake hang fixes from Mika.
* tag 'drm-intel-fixes-2016-04-21' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Force ringbuffers to not be at offset 0
drm/i915: Adjust size of PIPE_CONTROL used for gen8 render seqno write
drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs
drm/i915/skl: Fix rc6 based gpu/system hang
drm/i915/userptr: Hold mmref whilst calling get-user-pages
drm/i915: Fixup the free space logic in ring_prepare
drm/i915/skl+: Use plane size for relative data rate calculation
Linus Torvalds [Thu, 21 Apr 2016 22:41:13 +0000 (15:41 -0700)]
Merge tag 'rtc-4.6-3' of git://git./linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"A few fixes for the RTC subsystem. The documentation fix already
missed 4.5 so I think it is worth taking it now:
A documentation fix for s3c and two fixes for the ds1307"
* tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: ds1307: Use irq when available for wakeup-source device
rtc: ds1307: ds3231 temperature s16 overflow
rtc: s3c: Document in binding that only s3c6410 needs a src clk
Linus Torvalds [Thu, 21 Apr 2016 21:29:34 +0000 (14:29 -0700)]
Merge tag 'pm+acpi-4.6-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Two fixes for issues introduced recently, one for an intel_pstate
driver problem uncovered by the recent switch over from using timers
and the other one for a potential cpufreq core problem related to
system suspend/resume.
Specifics:
- Fix an intel_pstate driver problem causing CPUs to get stuck in the
highest P-state when completely idle uncovered by the recent switch
over from using timers (Rafael Wysocki).
- Avoid attempts to get the current CPU frequency when all devices
(like I2C controllers that may be nedded for that purpose) have
been suspended during system suspend/resume (Rafael Wysocki)"
* tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
intel_pstate: Avoid getting stuck in high P-states when idle
Nishanth Menon [Tue, 19 Apr 2016 16:23:54 +0000 (11:23 -0500)]
rtc: ds1307: Use irq when available for wakeup-source device
With commit
8bc2a40730ec ("rtc: ds1307: add support for the
DT property 'wakeup-source'") we lost the ability for rtc irq
functionality for devices that are actually hooked on a real IRQ
line and have capability to wakeup as well. This is not an expected
behavior. So, instead of just not requesting IRQ, skip the IRQ
requirement only if interrupts are not defined for the device.
Fixes:
8bc2a40730ec ("rtc: ds1307: add support for the DT property 'wakeup-source'")
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Michael Lange <linuxstuff@milaw.biz>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Zhuang Yuyao [Mon, 18 Apr 2016 00:21:42 +0000 (09:21 +0900)]
rtc: ds1307: ds3231 temperature s16 overflow
while retrieving temperature from ds3231, the result may be overflow
since s16 is too small for a multiplication with 250.
ie. if temp_buf[0] == 0x2d, the result (s16 temp) will be negative.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Michael Tatarinov <kukabu@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Linus Torvalds [Thu, 21 Apr 2016 19:57:34 +0000 (12:57 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix memory leak in iwlwifi, from Matti Gottlieb.
2) Add missing registration of netfilter arp_tables into initial
namespace, from Florian Westphal.
3) Fix potential NULL deref in DecNET routing code.
4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
Ivanov.
5) Fix dst ref counting in VRF, from David Ahern.
6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.
7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.
8) Ravalidate IPV6 datagram socket cached routes properly, particularly
with UDP, from Martin KaFai Lau.
9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.
10) Fix stats typing in bcmgenet driver, from Eric Dumazet.
11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
from Joe Stringer.
12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.
13) atl2 doesn't actually support scatter-gather, so don't advertise the
feature. From Ben Hucthings.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
openvswitch: use flow protocol when recalculating ipv6 checksums
Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
atl2: Disable unimplemented scatter/gather feature
net/mlx4_en: Split SW RX dropped counter per RX ring
net/mlx4_core: Don't allow to VF change global pause settings
net/mlx4_core: Avoid repeated calls to pci enable/disable
net/mlx4_core: Implement pci_resume callback
net: phy: spi_ks8895: Don't leak references to SPI devices
net: ethernet: davinci_emac: Fix platform_data overwrite
net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
qede: Fix single MTU sized packet from firmware GRO flow
qede: Fix setting Skb network header
qede: Fix various memory allocation error flows for fastpath
tcp: Merge tx_flags and tskey in tcp_shifted_skb
tcp: Merge tx_flags and tskey in tcp_collapse_retrans
drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
openvswitch: Orphan skbs before IPv6 defrag
Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
VSOCK: Only check error on skb_recv_datagram when skb is NULL
...
Simon Horman [Thu, 21 Apr 2016 01:49:15 +0000 (11:49 +1000)]
openvswitch: use flow protocol when recalculating ipv6 checksums
When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.
This patch resolves the problem by relying on the protocol
in the flow key rather than that in the set field action.
Fixes:
83d2b9ba1abc ("net: openvswitch: Support masked set actions.")
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shrikrishna Khare [Thu, 21 Apr 2016 01:12:29 +0000 (18:12 -0700)]
Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.
Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 20 Apr 2016 22:23:08 +0000 (23:23 +0100)]
atl2: Disable unimplemented scatter/gather feature
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes:
ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Burton [Thu, 21 Apr 2016 10:31:54 +0000 (11:31 +0100)]
irqchip/mips-gic: Don't overrun pcpu_masks array
Commit
2a0787051182 ("irqchip/mips-gic: Use gic_vpes instead of
NR_CPUS") & commit
78930f09b940 ("irqchip/mips-gic: Clear percpu_masks
correctly when mapping") both introduce code which accesses gic_vpes
entries in the pcpu_masks array. However, this array has length NR_CPUS.
If NR_CPUS is less than gic_vpes (ie. the kernel supports use of less
CPUs than are present in the system) then we overrun the array, clobber
some other data & generally die pretty promptly.
Most notably this affects uniprocessor kernels running on any multicore
or multithreaded Malta with a GIC (ie. the vast majority of real Malta
boards).
Fix this by only accessing up to min(gic_vpes, NR_CPUS) entries in the
pcpu_masks array, preventing the array overrun.
Fixes:
2a0787051182 ("irqchip/mips-gic: Use gic_vpes instead of NR_CPUS")
Fixes:
78930f09b940 ("irqchip/mips-gic: Clear percpu_masks correctly when mapping")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/1461234714-9975-1-git-send-email-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David S. Miller [Thu, 21 Apr 2016 19:02:41 +0000 (15:02 -0400)]
Merge branch 'mlx4-fixes'
Or Gerlitz says:
====================
Mellaox 40G driver fixes for 4.6-rc
With the fix for ARM bug being under the works, these are
few other fixes for mlx4 we have ready to go.
Eran addressed the problematic/wrong reporting of dropped packets, Daniel
fixed some matters related to PPC EEH's and Jenny's patch makes sure
VFs can't change the port's pause settings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 20 Apr 2016 13:01:18 +0000 (16:01 +0300)]
net/mlx4_en: Split SW RX dropped counter per RX ring
Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.
In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.
Fixes:
a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eugenia Emantayev [Wed, 20 Apr 2016 13:01:17 +0000 (16:01 +0300)]
net/mlx4_core: Don't allow to VF change global pause settings
Currently changing global pause settings is done via SET_PORT
command with input modifier GENERAL. This command is allowed
for each VF since MTU setting is done via the same command.
Change the above to the following scheme: before passing the
request to the FW, the PF will check whether it was issued
by a slave. If yes, don't change global pause and warn,
otherwise change to the requested value and store for
further reference.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Jurgens [Wed, 20 Apr 2016 13:01:16 +0000 (16:01 +0300)]
net/mlx4_core: Avoid repeated calls to pci enable/disable
Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device. Performing the actions more than once without doing
its opposite results in warning logs.
This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.
Fixes:
2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Jurgens [Wed, 20 Apr 2016 13:01:15 +0000 (16:01 +0300)]
net/mlx4_core: Implement pci_resume callback
Move resume related activities to a new pci_resume function instead of
performing them in mlx4_pci_slot_reset. This change is needed to avoid
a hotplug during EEH recovery due to commit
f2da4ccf8bd4 ("powerpc/eeh:
More relaxed hotplug criterion").
Fixes:
2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Brown [Wed, 20 Apr 2016 11:54:05 +0000 (12:54 +0100)]
net: phy: spi_ks8895: Don't leak references to SPI devices
The ks8895 driver is using spi_dev_get() apparently just to take a copy
of the SPI device used to instantiate it but never calls spi_dev_put()
to free it. Since the device is guaranteed to exist between probe() and
remove() there should be no need for the driver to take an extra
reference to it so fix the leak by just using a straight assignment.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Wed, 20 Apr 2016 08:56:45 +0000 (10:56 +0200)]
net: ethernet: davinci_emac: Fix platform_data overwrite
When the DaVinci emac driver is removed and re-probed, the actual
pdev->dev.platform_data is populated with an unwanted valid pointer saved by
the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
calling priv->int_disable() in emac_int_disable().
Unable to handle kernel paging request at virtual address
c8622a80
...
[<
c0426fb4>] (emac_int_disable) from [<
c0427700>] (emac_dev_open+0x290/0x5f8)
[<
c0427700>] (emac_dev_open) from [<
c04c00ec>] (__dev_open+0xb8/0x120)
[<
c04c00ec>] (__dev_open) from [<
c04c0370>] (__dev_change_flags+0x88/0x14c)
[<
c04c0370>] (__dev_change_flags) from [<
c04c044c>] (dev_change_flags+0x18/0x48)
[<
c04c044c>] (dev_change_flags) from [<
c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
[<
c052bafc>] (devinet_ioctl) from [<
c04a1428>] (sock_ioctl+0x1d8/0x2c0)
[<
c04a1428>] (sock_ioctl) from [<
c014f054>] (do_vfs_ioctl+0x41c/0x600)
[<
c014f054>] (do_vfs_ioctl) from [<
c014f2a4>] (SyS_ioctl+0x6c/0x7c)
[<
c014f2a4>] (SyS_ioctl) from [<
c000ff60>] (ret_fast_syscall+0x0/0x1c)
Fixes:
42f59967a091 ("net: ethernet: davinci_emac: add OF support")
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Wed, 20 Apr 2016 08:56:13 +0000 (10:56 +0200)]
net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
emac driver when the device is removed and re-probed, and a
pm_runtime_disable() call in davinci_emac_remove().
Actually, using unbind/bind on a TI DM8168 SoC gives :
$ echo
4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
net eth1: DaVinci EMAC: davinci_emac_remove()
$ echo
4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
davinci_emac
4a120000.ethernet: Unbalanced pm_runtime_enable
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes:
3ba97381343b ("net: ethernet: davinci_emac: add pm_runtime support")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Thu, 21 Apr 2016 18:57:46 +0000 (20:57 +0200)]
Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
intel_pstate: Avoid getting stuck in high P-states when idle
David S. Miller [Thu, 21 Apr 2016 18:51:29 +0000 (14:51 -0400)]
Merge branch 'qed-fixes'
Manish Chopra says:
====================
qede: Bug fixes
This series fixes -
* various memory allocation failure flows for fastpath
* issues with respect to driver GRO packets handling
V1->V2
* Send series against net instead of net-next.
Please consider applying this series to "net"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Wed, 20 Apr 2016 07:03:29 +0000 (03:03 -0400)]
qede: Fix single MTU sized packet from firmware GRO flow
In firmware assisted GRO flow there could be a single MTU sized
segment arriving due to firmware aggregation timeout/last segment
in an aggregation flow, which is not expected to be an actual gro
packet. So If a skb has zero frags from the GRO flow then simply
push it in the stack as non gso skb.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Wed, 20 Apr 2016 07:03:28 +0000 (03:03 -0400)]
qede: Fix setting Skb network header
Skb's network header needs to be set before extracting IPv4/IPv6
headers from it.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Wed, 20 Apr 2016 07:03:27 +0000 (03:03 -0400)]
qede: Fix various memory allocation error flows for fastpath
This patch handles memory allocation failures for fastpath
gracefully in the driver.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Apr 2016 18:40:56 +0000 (14:40 -0400)]
Merge branch 'tcp-coalesce-merge-timestamps'
Martin KaFai Lau says:
====================
tcp: Merge timestamp info when coalescing skbs
This series is separated from the RFC series related to
tcp_sendmsg(MSG_EOR) and it is targeting for the net branch.
This patchset is focusing on fixing cases where TCP
timestamp could be lost after coalescing skbs.
A BPF prog is used to kprobe to sock_queue_err_skb()
and print out the value of serr->ee.ee_data. The BPF
prog (run-able from bcc) is attached here:
BPF prog used for testing:
~~~~~
from __future__ import print_function
from bcc import BPF
bpf_text = """
int trace_err_skb(struct pt_regs *ctx)
{
struct sk_buff *skb = (struct sk_buff *)ctx->si;
struct sock *sk = (struct sock *)ctx->di;
struct sock_exterr_skb *serr;
u32 ee_data = 0;
if (!sk || !skb)
return 0;
serr = SKB_EXT_ERR(skb);
bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
bpf_trace_printk("ee_data:%u\\n", ee_data);
return 0;
};
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Wed, 20 Apr 2016 05:39:29 +0000 (22:39 -0700)]
tcp: Merge tx_flags and tskey in tcp_shifted_skb
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible. tx_flags and tskey also have to be
merged.
This patch reuses the tcp_skb_collapse_tstamp() to handle
them.
BPF Output Before:
~~~~~
<no-output-due-to-missing-tstamp-event>
BPF Output After:
~~~~~
<...>-2024 [007] d.s. 88.644374: : ee_data:14599
Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140
0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257
0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Wed, 20 Apr 2016 05:39:28 +0000 (22:39 -0700)]
tcp: Merge tx_flags and tskey in tcp_collapse_retrans
If two skbs are merged/collapsed during retransmission, the current
logic does not merge the tx_flags and tskey. The end result is
the SCM_TSTAMP_ACK timestamp could be missing for a packet.
The patch:
1. Merge the tx_flags
2. Overwrite the prev_skb's tskey with the next_skb's tskey
BPF Output Before:
~~~~~~
<no-output-due-to-missing-tstamp-event>
BPF Output After:
~~~~~~
packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459
Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
0.200 write(4, ..., 11680) = 11680
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 13141 win 257
0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Tue, 19 Apr 2016 18:09:49 +0000 (21:09 +0300)]
drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
The cpsw_ndo_open() could try to access CPSW registers before
calling pm_runtime_get_sync(). This will trigger L3 error:
WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c()
44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access
and CPSW will stop functioning.
Hence, fix it by moving pm_runtime_get_sync() before the first access
to CPSW registers in cpsw_ndo_open().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>