GitHub/moto-9609/android_kernel_motorola_exynos9610.git
8 years agobtrfs: opencode chunk locking, remove helpers
David Sterba [Tue, 4 Oct 2016 17:34:27 +0000 (19:34 +0200)]
btrfs: opencode chunk locking, remove helpers

The helpers are trivial and we don't use them consistently.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove root parameter from transaction commit/end routines
Jeff Mahoney [Sat, 10 Sep 2016 01:39:03 +0000 (21:39 -0400)]
btrfs: remove root parameter from transaction commit/end routines

Now we only use the root parameter to print the root objectid in
a tracepoint.  We can use the root parameter from the transaction
handle for that.  It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: split btrfs_wait_marked_extents into normal and tree log functions
Jeff Mahoney [Sat, 10 Sep 2016 00:42:44 +0000 (20:42 -0400)]
btrfs: split btrfs_wait_marked_extents into normal and tree log functions

btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.

This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root.  The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines.  This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: take an fs_info directly when the root is not used otherwise
Jeff Mahoney [Wed, 22 Jun 2016 22:54:24 +0000 (18:54 -0400)]
btrfs: take an fs_info directly when the root is not used otherwise

There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: simplify btrfs_wait_cache_io prototype
Jeff Mahoney [Fri, 9 Sep 2016 16:09:35 +0000 (12:09 -0400)]
btrfs: simplify btrfs_wait_cache_io prototype

With the exception of the one case where btrfs_wait_cache_io is called
without a block group, it's called with the same arguments.  The root
argument is only used in the special case, so let's factor out the core
and simplify the call in the normal case to require a trans, block group,
and path.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: convert extent-tree tracepoints to use fs_info
Jeff Mahoney [Tue, 6 Sep 2016 20:00:42 +0000 (16:00 -0400)]
btrfs: convert extent-tree tracepoints to use fs_info

The extent-tree tracepoints all operate on the extent root, regardless of
which root is passed in.  Let's just use the extent root objectid instead.
If it turns out that nobody is depending on the format of this tracepoint,
we can drop the root printing entirely.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, access fs_info->delayed_root directly
Jeff Mahoney [Wed, 22 Jun 2016 22:54:23 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, access fs_info->delayed_root directly

This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, add fs_info convenience variables
Jeff Mahoney [Wed, 22 Jun 2016 22:54:23 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, add fs_info convenience variables

In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, update_block_group{,flags}
Jeff Mahoney [Wed, 22 Jun 2016 22:54:22 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, update_block_group{,flags}

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, lock/unlock_chunks
Jeff Mahoney [Thu, 16 Jun 2016 15:30:29 +0000 (11:30 -0400)]
btrfs: root->fs_info cleanup, lock/unlock_chunks

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
Jeff Mahoney [Thu, 16 Jun 2016 15:07:27 +0000 (11:07 -0400)]
btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: pull node/sector/stripe sizes out of root and into fs_info
Jeff Mahoney [Wed, 15 Jun 2016 13:22:56 +0000 (09:22 -0400)]
btrfs: pull node/sector/stripe sizes out of root and into fs_info

We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, io_ctl_init
Jeff Mahoney [Wed, 22 Jun 2016 22:56:18 +0000 (18:56 -0400)]
btrfs: root->fs_info cleanup, io_ctl_init

The io_ctl->root member was only being used to access root->fs_info.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
Jeff Mahoney [Wed, 22 Jun 2016 22:54:56 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: struct reada_control.root -> reada_control.fs_info
Jeff Mahoney [Wed, 22 Jun 2016 22:56:44 +0000 (18:56 -0400)]
btrfs: struct reada_control.root -> reada_control.fs_info

The root is never used.  We substitute extent_root in for the
reada_find_extent call, since it's only ever used to obtain the node
size.  This call site will be changed to use fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: struct btrfsic_state->root should be an fs_info
Jeff Mahoney [Wed, 22 Jun 2016 22:54:36 +0000 (18:54 -0400)]
btrfs: struct btrfsic_state->root should be an fs_info

The root member is never used except for obtaining an fs_info pointer.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: alloc_reserved_file_extent trace point should use extent_root
Jeff Mahoney [Wed, 22 Jun 2016 22:54:27 +0000 (18:54 -0400)]
btrfs: alloc_reserved_file_extent trace point should use extent_root

Even though a separate root is passed in, we're still operating on the
extent root.  Let's use that for the trace point.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: btrfs_init_new_device should use fs_info->dev_root
Jeff Mahoney [Wed, 22 Jun 2016 00:16:08 +0000 (20:16 -0400)]
btrfs: btrfs_init_new_device should use fs_info->dev_root

btrfs_init_new_device only uses the root passed in via the ioctl to
start the transaction.  Nothing else that happens is related to whatever
root the user used to initiate the ioctl.  We can drop the root requirement
and just use fs_info->dev_root instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: call functions that always use the same root with fs_info instead
Jeff Mahoney [Wed, 22 Jun 2016 01:16:51 +0000 (21:16 -0400)]
btrfs: call functions that always use the same root with fs_info instead

There are many functions that are always called with the same root
argument.  Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: call functions that overwrite their root parameter with fs_info
Jeff Mahoney [Tue, 21 Jun 2016 14:40:19 +0000 (10:40 -0400)]
btrfs: call functions that overwrite their root parameter with fs_info

There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoMerge branch 'misc-4.10' into for-chris-4.10-20161130
David Sterba [Wed, 30 Nov 2016 13:02:20 +0000 (14:02 +0100)]
Merge branch 'misc-4.10' into for-chris-4.10-20161130

8 years agobtrfs: improve delayed refs iterations
Wang Xiaoguang [Wed, 26 Oct 2016 10:07:33 +0000 (18:07 +0800)]
btrfs: improve delayed refs iterations

This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:

PerfTop:    7416 irqs/sec  kernel:99.8%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
---------------------------------------------------------------------------------------
    84.37%  [btrfs]             [k] __btrfs_run_delayed_refs.constprop.80
    11.02%  [kernel]            [k] delay_tsc
     0.79%  [kernel]            [k] _raw_spin_unlock_irq
     0.78%  [kernel]            [k] _raw_spin_unlock_irqrestore
     0.45%  [kernel]            [k] do_raw_spin_lock
     0.18%  [kernel]            [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.

To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:

PerfTop:    2734 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
----------------------------------------------------------------------------------------

    20.74%  [kernel]          [k] _raw_spin_unlock_irqrestore
    16.33%  [kernel]          [k] __slab_alloc
     5.41%  [kernel]          [k] lock_acquired
     4.42%  [kernel]          [k] lock_acquire
     4.05%  [kernel]          [k] lock_release
     3.37%  [kernel]          [k] _raw_spin_unlock_irq

For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: qgroup: Fix qgroup data leaking by using subtree tracing
Qu Wenruo [Tue, 18 Oct 2016 01:31:29 +0000 (09:31 +0800)]
btrfs: qgroup: Fix qgroup data leaking by using subtree tracing

Commit 62b99540a1d91e464 (btrfs: relocation: Fix leaking qgroups numbers
on data extents) only fixes the problem partly.

The previous fix is to trace all new data extents at transaction commit
time when balance finishes.

However balance is not done in a large transaction, every path
replacement can happen in its own transaction.
This makes the fix useless if transaction commits during relocation.

For example:
relocate_block_group()
|-merge_reloc_roots()
|  |- merge_reloc_root()
|     |- btrfs_start_transaction()         <- Trans X
|     |- replace_path()                    <- Cause leak
|     |- btrfs_end_transaction_throttle()  <- Trans X commits here
|     |                                       Leak not fixed
|     |
|     |- btrfs_start_transaction()         <- Trans Y
|     |- replace_path()                    <- Cause leak
|     |- btrfs_end_transaction_throttle()  <- Trans Y ends
|                                             but not committed
|-btrfs_join_transaction()                 <- Still trans Y
|-qgroup_fix()                             <- Only fixes data leak
|                                             in trans Y
|-btrfs_commit_transaction()               <- Trans Y commits

In that case, qgroup fixup can only fix data leak in trans Y, data leak
in trans X is out of fix.

So the correct fix should happen in the same transaction of
replace_path().

This patch fixes it by tracing both subtrees of tree block swap, so it
can fix the problem and ensure all leaking and fix are in the same
transaction, so no leak again.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: Export and move leaf/subtree qgroup helpers to qgroup.c
Qu Wenruo [Tue, 18 Oct 2016 01:31:28 +0000 (09:31 +0800)]
btrfs: Export and move leaf/subtree qgroup helpers to qgroup.c

Move account_shared_subtree() to qgroup.c and rename it to
btrfs_qgroup_trace_subtree().

Do the same thing for account_leaf_items() and rename it to
btrfs_qgroup_trace_leaf_items().

Since all these functions are only for qgroup, move them to qgroup.c and
export them is more appropriate.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: qgroup: Rename functions to make it follow reserve,trace,account steps
Qu Wenruo [Tue, 18 Oct 2016 01:31:27 +0000 (09:31 +0800)]
btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps

Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: qgroup: Add comments explaining how btrfs qgroup works
Qu Wenruo [Tue, 18 Oct 2016 01:31:26 +0000 (09:31 +0800)]
btrfs: qgroup: Add comments explaining how btrfs qgroup works

Add explaination how btrfs qgroups work.

Qgroup is split into 3 main phrases:
1) Reserve
   To ensure qgroup doesn't exceed its limit

2) Trace
   To info qgroup to trace which extent

3) Account
   Calculate qgroup number change for each traced extent.

This should save quite some time for new developers.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: use bio_for_each_segment_all in __btrfsic_submit_bio
Christoph Hellwig [Fri, 25 Nov 2016 08:07:53 +0000 (09:07 +0100)]
btrfs: use bio_for_each_segment_all in __btrfsic_submit_bio

And remove the bogus check for a NULL return value from kmap, which
can't happen.  While we're at it: I don't think that kmapping up to 256
will work without deadlocks on highmem machines, a better idea would
be to use vm_map_ram to map all of them into a single virtual address
range.  Incidentally that would also simplify the code a lot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all
Christoph Hellwig [Fri, 25 Nov 2016 08:07:52 +0000 (09:07 +0100)]
btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all

Rework the loop a little bit to use the generic bio_for_each_segment_all
helper for iterating over the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: calculate end of bio offset properly
Christoph Hellwig [Fri, 25 Nov 2016 08:07:51 +0000 (09:07 +0100)]
btrfs: calculate end of bio offset properly

Use the bvec offset and len members to prepare for multipage bvecs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: use bi_size
Christoph Hellwig [Fri, 25 Nov 2016 08:07:50 +0000 (09:07 +0100)]
btrfs: use bi_size

Instead of using bi_vcnt to calculate it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: don't access the bio directly in btrfs_csum_one_bio
Christoph Hellwig [Fri, 25 Nov 2016 08:07:49 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in btrfs_csum_one_bio

Use bio_for_each_segment_all to iterate over the segments instead.
This requires a bit of reshuffling so that we only lookup up the ordered
item once inside the bio_for_each_segment_all loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: don't access the bio directly in the direct I/O code
Christoph Hellwig [Fri, 25 Nov 2016 08:07:48 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in the direct I/O code

Just use bio_for_each_segment_all to iterate over all segments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: don't access the bio directly in the raid5/6 code
Christoph Hellwig [Fri, 25 Nov 2016 08:07:47 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in the raid5/6 code

Just use bio_for_each_segment_all to iterate over all segments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: use bio iterators for the decompression handlers
Christoph Hellwig [Fri, 25 Nov 2016 08:07:46 +0000 (09:07 +0100)]
btrfs: use bio iterators for the decompression handlers

Pass the full bio to the decompression routines and use bio iterators
to iterate over the data in the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space
Jeff Mahoney [Sat, 19 Nov 2016 02:52:40 +0000 (21:52 -0500)]
btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space

This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in
btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount
with qgroups enabled.

I was able to reproduce this by setting up a small (~500kb) quota limit
and writing a file one byte at a time until I hit the limit.  The warnings
would all hit on umount.

The root cause is that we would reserve a block-sized range in both
the reservation and the quota in btrfs_check_data_free_space, but if we
encountered a problem (like e.g. EDQUOT), we would only release the single
byte in the qgroup reservation.  That caused an iotree state split, which
increased the number of outstanding extents, in turn disallowing releasing
the metadata reservation.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoBtrfs: abort transaction if fill_holes() fails
Josef Bacik [Mon, 14 Nov 2016 19:06:22 +0000 (14:06 -0500)]
Btrfs: abort transaction if fill_holes() fails

At this point we will have dropped extent entries from the file, so if we fail
to insert the new hole entries then we are leaving the fs in a corrupt state
(albeit an easily fixed one).  Abort the transaciton if this happens so we can
avoid corrupting the fs.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoBtrfs: fix file extent corruption
Josef Bacik [Wed, 16 Nov 2016 14:13:39 +0000 (09:13 -0500)]
Btrfs: fix file extent corruption

In order to do hole punching we have a block reserve to hold the reservation we
need to drop the extents in our range.  Since we could end up dropping a lot of
extents we set rsv->failfast so we can just loop around again and drop the
remaining of the range.  Unfortunately we unconditionally fill the hole extents
in and start from the last extent we encountered, which we may or may not have
dropped.  So this can result in overlapping file extent entries, which can be
tripped over in a variety of ways, either by hitting BUG_ON(!ret) in
fill_holes() after the search, or in btrfs_set_item_key_safe() in
btrfs_drop_extent() at a later time by an unrelated task.  Fix this by only
setting drop_end to the last extent we did actually drop.  This way our holes
are filled in properly for the range that we did drop, and the rest of the range
that remains to be dropped is actually dropped.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: increment ctx->pos for every emitted or skipped dirent in readdir
Jeff Mahoney [Sat, 5 Nov 2016 17:26:35 +0000 (13:26 -0400)]
btrfs: increment ctx->pos for every emitted or skipped dirent in readdir

If we process the last item in the leaf and hit an I/O error while
reading the next leaf, we return -EIO without having adjusted the
position.  Since we have emitted dirents, getdents() will return
the byte count to the user instead of the error.  Subsequent callers
will emit the last successful dirent again, and return -EIO again,
with the same result.  Callers loop forever.

Instead, if we always increment ctx->pos after emitting or skipping
the dirent, we'll be sure that we won't hit the same one again.  When
we go to process the next leaf, we won't have emitted any dirents
and the -EIO will be returned to the user properly.  We also don't
need to track if we've emitted a dirent already or if we've changed
the position yet.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove old tree_root dirent processing in btrfs_real_readdir()
Jeff Mahoney [Mon, 21 Nov 2016 14:59:04 +0000 (15:59 +0100)]
btrfs: remove old tree_root dirent processing in btrfs_real_readdir()

Commit 3de4586c527 (Btrfs: Allow subvolumes and snapshots anywhere
in the directory tree) introduced the current system of placing
snapshots in the directory tree.  It also introduced the behavior of
creating the snapshot and then creating the directory entries for it.

We've kept this code around for compatibility reasons, but it turns
out that no file systems with the old tree_root based snapshots can
be mounted on newer (>= 2009) kernels anyway.  About a month after the
above commit, commit 2a7108ad89e (Btrfs: rev the disk format for the
inode compat and csum selection changes) landed, changing the superblock
magic number.

As a result, we know that we'll never encounter tree_root-based dirents
or have to deal with skipping our own snapshot dirents.  Since that
also means that we're now only iterating over DIR_INDEX items, which only
contain one directory entry per leaf item, we don't need to loop over
the leaf item contents anymore either.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: Call kunmap if zlib_inflateInit2 fails
Nick Terrell [Wed, 2 Nov 2016 03:25:27 +0000 (20:25 -0700)]
btrfs: Call kunmap if zlib_inflateInit2 fails

If zlib_inflateInit2 fails, the input page is never unmapped.
Add a call to kunmap when it fails.

Signed-off-by: Nick Terrell <nickrterrell@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: store and load values of stripes_min/stripes_max in balance status item
David Sterba [Tue, 1 Nov 2016 13:21:23 +0000 (14:21 +0100)]
btrfs: store and load values of stripes_min/stripes_max in balance status item

The balance status item contains currently known filter values, but the
stripes filter was unintentionally not among them. This would mean, that
interrupted and automatically restarted balance does not apply the
stripe filters.

Fixes: dee32d0ac3719ef8d640efaf0884111df444730f
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove redundant check of btrfs_iget return value
Christophe JAILLET [Tue, 1 Nov 2016 10:26:06 +0000 (11:26 +0100)]
btrfs: remove redundant check of btrfs_iget return value

'btrfs_iget()' can not return NULL, so this test can be removed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: change btrfs_csum_final result param type to u8
Domagoj Tršan [Thu, 27 Oct 2016 07:52:33 +0000 (08:52 +0100)]
btrfs: change btrfs_csum_final result param type to u8

csum member of struct btrfs_super_block has array type of u8. It makes
sense that function btrfs_csum_final should be also declared to accept
u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc,
char *result); to void btrfs_csum_final(u32 crc, u8 *result);

Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com>
[ changed cast to u8 at several call sites ]
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoBtrfs: adjust len of writes if following a preallocated extent
Liu Bo [Fri, 4 Nov 2016 19:20:54 +0000 (12:20 -0700)]
Btrfs: adjust len of writes if following a preallocated extent

If we have

|0--hole--4095||4096--preallocate--12287|

instead of using preallocated space, a 8K direct write will just
create a new 8K extent and it'll end up with

|0--new extent--8191||8192--preallocate--12287|

It's because we find a hole em and then go to create a new 8K
extent directly without adjusting @len.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: return early from failed memory allocations in ioctl handlers
Shailendra Verma [Thu, 10 Nov 2016 09:47:41 +0000 (15:17 +0530)]
btrfs: return early from failed memory allocations in ioctl handlers

There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.

Signed-off-by: Shailendra Verma <shailendra.v@samsung.com>
[ edit subject ]
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: add optimized version of eb to eb copy
David Sterba [Tue, 8 Nov 2016 17:30:31 +0000 (18:30 +0100)]
btrfs: add optimized version of eb to eb copy

Using copy_extent_buffer is suitable for copying betwenn buffers from an
arbitrary offset and deals with page boundaries. This is not necessary
when doing a full extent_buffer-to-extent_buffer copy. We can utilize
the copy_page helper as well.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove constant parameter to memset_extent_buffer and rename it
David Sterba [Tue, 8 Nov 2016 17:09:03 +0000 (18:09 +0100)]
btrfs: remove constant parameter to memset_extent_buffer and rename it

The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: use specialized page copying helpers in btrfs_clone_extent_buffer
David Sterba [Tue, 8 Nov 2016 16:56:24 +0000 (17:56 +0100)]
btrfs: use specialized page copying helpers in btrfs_clone_extent_buffer

The copy_page is usually optimized and can be faster than memcpy.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: use new helpers to set uuids in eb
David Sterba [Wed, 9 Nov 2016 16:44:25 +0000 (17:44 +0100)]
btrfs: use new helpers to set uuids in eb

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: introduce helpers for updating eb uuids
David Sterba [Wed, 9 Nov 2016 16:43:38 +0000 (17:43 +0100)]
btrfs: introduce helpers for updating eb uuids

The fsid and chunk tree uuid are always located in the first page,
we don't need the to use write_extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: delete unused member from superblock
David Sterba [Tue, 8 Nov 2016 23:03:12 +0000 (00:03 +0100)]
btrfs: delete unused member from superblock

 __bdev' has never been used since
 0b86a832a1f38abec695864ec2eaedc9d2383f1b (2008).

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove trivial helper btrfs_find_tree_block
David Sterba [Tue, 8 Nov 2016 22:21:05 +0000 (23:21 +0100)]
btrfs: remove trivial helper btrfs_find_tree_block

During the time, the function has been shrunk to the point that it just
calls find_extent_buffer, just passing the parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: reada, remove pointless BUG_ON check for fs_info
David Sterba [Tue, 8 Nov 2016 16:18:35 +0000 (17:18 +0100)]
btrfs: reada, remove pointless BUG_ON check for fs_info

We dereference fs_info several times, besides that post-mount functions
should never see a NULL fs_info.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: reada, remove pointless BUG_ON in reada_find_extent
David Sterba [Tue, 8 Nov 2016 16:11:27 +0000 (17:11 +0100)]
btrfs: reada, remove pointless BUG_ON in reada_find_extent

The lock is held, we make the same lookup that previously failed with
EEXIST and we don't insert NULL pointers.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: reada, sink start parameter to btree_readahead_hook
David Sterba [Tue, 8 Nov 2016 12:50:03 +0000 (13:50 +0100)]
btrfs: reada, sink start parameter to btree_readahead_hook

Originally, the eb and start were passed separately in case eb is NULL.
Since the readahead has been refactored in 4.6, this is not true anymore
and we can get rid of the parameter.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: reada, remove unused parameter from __readahead_hook
David Sterba [Tue, 8 Nov 2016 12:39:05 +0000 (13:39 +0100)]
btrfs: reada, remove unused parameter from __readahead_hook

'start' is not used since "btrfs: reada: Pass reada_extent into
__readahead_hook directly" (6e39dbe8b9e55280c).

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: reada, cleanup remove unneeded variable in __readahead_hook
David Sterba [Tue, 8 Nov 2016 12:32:43 +0000 (13:32 +0100)]
btrfs: reada, cleanup remove unneeded variable in __readahead_hook

We can't touch the eb directly in case the function is called with a
non-zero error, so we can read the eb level when needed.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: rename helper macros for qgroup and aux data casts
David Sterba [Wed, 26 Oct 2016 14:23:50 +0000 (16:23 +0200)]
btrfs: rename helper macros for qgroup and aux data casts

The helpers are not meant to be generic, the name is misleading. Convert
them to static inlines for type checking.

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove stale comment from btrfs_statfs
David Sterba [Wed, 5 Oct 2016 12:23:06 +0000 (14:23 +0200)]
btrfs: remove stale comment from btrfs_statfs

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove unused headers, statfs.h
David Sterba [Wed, 5 Oct 2016 12:23:05 +0000 (14:23 +0200)]
btrfs: remove unused headers, statfs.h

Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: remove useless comments
Xiaoguang Wang [Fri, 23 Sep 2016 04:38:50 +0000 (12:38 +0800)]
btrfs: remove useless comments

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: make block group flags in balance printks human-readable
Adam Borowski [Mon, 14 Nov 2016 17:44:34 +0000 (18:44 +0100)]
btrfs: make block group flags in balance printks human-readable

They're not even documented anywhere, letting users with no recourse but
to RTFS.  It's no big burden to output the bitfield as words.

Also, display unknown flags as hex.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoBtrfs: deal with existing encompassing extent map in btrfs_get_extent()
Omar Sandoval [Wed, 9 Nov 2016 23:26:50 +0000 (15:26 -0800)]
Btrfs: deal with existing encompassing extent map in btrfs_get_extent()

My QEMU VM was seeing inexplicable I/O errors that I tracked down to
errors coming from the qcow2 virtual drive in the host system. The qcow2
file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
Every once in awhile, pread() or pwrite() would return EEXIST, which
makes no sense. This turned out to be a bug in btrfs_get_extent().

Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map
insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
two threads race on adding the same extent map to an inode's extent map
tree. However, if the added em is merged with an adjacent em in the
extent tree, then we'll end up with an existing extent that is not
identical to but instead encompasses the extent we tried to add. When we
call merge_extent_mapping() to find the nonoverlapping part of the new
em, the arithmetic overflows because there is no such thing. We then end
up trying to add a bogus em to the em_tree, which results in a EEXIST
that can bubble all the way up to userspace.

Fix it by extending the identical extent map special case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: add necessary comments about tickets_id
Wang Xiaoguang [Mon, 7 Nov 2016 07:59:16 +0000 (15:59 +0800)]
btrfs: add necessary comments about tickets_id

Tickets_id's name may result in some misunderstandings,  it just indicates
the next ticket will be handled and is not stored per ticket.

Fixes: ce12965 ("btrfs: introduce tickets_id to determine whether
asynchronous metadata reclaim work makes progress")
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: cleanup: use already calculated value in btrfs_should_throttle_delayed_refs()
Wang Xiaoguang [Wed, 26 Oct 2016 07:23:01 +0000 (15:23 +0800)]
btrfs: cleanup: use already calculated value in btrfs_should_throttle_delayed_refs()

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agobtrfs: don't abuse REQ_OP_* flags for btrfs_map_block
Christoph Hellwig [Thu, 27 Oct 2016 07:27:36 +0000 (09:27 +0200)]
btrfs: don't abuse REQ_OP_* flags for btrfs_map_block

btrfs_map_block supports different types of mappings, which to a large
extent resemble block layer operations.  But they don't always do, and
currently btrfs dangerously overlays it's own flag over the block layer
flags.  This is just asking for a conflict, so introduce a different
map flags enum inside of btrfs instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
8 years agoLinux 4.9-rc7
Linus Torvalds [Sun, 27 Nov 2016 21:08:04 +0000 (13:08 -0800)]
Linux 4.9-rc7

8 years agoMerge git://git.infradead.org/intel-iommu
Linus Torvalds [Sun, 27 Nov 2016 16:24:46 +0000 (08:24 -0800)]
Merge git://git.infradead.org/intel-iommu

Pull IOMMU fixes from David Woodhouse:
 "Two minor fixes.

  The first fixes the assignment of SR-IOV virtual functions to the
  correct IOMMU unit, and the second fixes the excessively large (and
  physically contiguous) PASID tables used with SVM"

* git://git.infradead.org/intel-iommu:
  iommu/vt-d: Fix PASID table allocation
  iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions

8 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 27 Nov 2016 16:22:59 +0000 (08:22 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "Another round of MIPS fixes for 4.9:

   - Fix unreadable output in __do_page_fault due to the KERN_CONT
     patchset

   - Correctly handle MIPS R6 fixes to the c0_wired register"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: mm: Fix output of __do_page_fault
  MIPS: Mask out limit field when calculating wired entry count

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 27 Nov 2016 01:21:13 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs splice fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix default_file_splice_read()

8 years agofix default_file_splice_read()
Al Viro [Sun, 27 Nov 2016 01:05:42 +0000 (20:05 -0500)]
fix default_file_splice_read()

Botched calculation of number of pages.  As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 26 Nov 2016 23:28:34 +0000 (15:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Here is a revert and two bugfixes for the I2C designware driver.

  Please note that we are still hunting down a regression for the
  i2c-octeon driver. While there is a fix pending, we have unclear
  feedback from the testers currently. An rc8 would be quite helpful
  for this case"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: designware: do not disable adapter after transfer"
  i2c: designware: fix rx fifo depth tracking
  i2c: designware: report short transfers

8 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sat, 26 Nov 2016 23:26:20 +0000 (15:26 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fix from Russell King:
 "This resolves the ksyms issues by reverting the commit which
  introduced the breakage"

There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  Revert "arm: move exports to definitions"

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 26 Nov 2016 21:05:05 +0000 (13:05 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix leak in fsl/fman driver, from Dan Carpenter.

 2) Call flow dissector initcall earlier than any networking driver can
    register and start to use it, from Eric Dumazet.

 3) Some dup header fixes from Geliang Tang.

 4) TIPC link monitoring compat fix from Jon Paul Maloy.

 5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
    Florian Fainelli.

 6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
    Mashak.

 7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  tipc: resolve connection flow control compatibility problem
  mvpp2: use correct size for memset
  net/mlx5: drop duplicate header delay.h
  net: ieee802154: drop duplicate header delay.h
  ibmvnic: drop duplicate header seq_file.h
  fsl/fman: fix a leak in tgec_free()
  net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
  tipc: improve sanity check for received domain records
  tipc: fix compatibility bug in link monitoring
  net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
  dwc_eth_qos: drop duplicate headers
  net sched filters: fix filter handle ID in tfilter_notify_chain()
  net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
  bnxt: do not busy-poll when link is down
  udplite: call proper backlog handlers
  ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
  net/mlx4_en: Free netdev resources under state lock
  net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
  rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
  bnxt_en: Fix a VXLAN vs GENEVE issue
  ...

8 years agoMerge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Sat, 26 Nov 2016 20:24:47 +0000 (12:24 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

 - Fix a crash that occurs at driver initialization if the memory region
   is already busy (request_mem_region() fails).

 - Fix a vma validation check that mistakenly allows a private device-
   dax mapping to be established. Device-dax explicitly forbids private
   mappings so it can guarantee a given fault granularity and backing
   memory type.

 Both of these fixes have soaked in -next and are tagged for -stable.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fail all private mapping attempts
  device-dax: check devm_nsio_enable() return value

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 26 Nov 2016 20:18:59 +0000 (12:18 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "Four fixes for bugs found by syzkaller on x86, all for stable"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: check for pic and ioapic presence before use
  KVM: x86: fix out-of-bounds accesses of rtc_eoi map
  KVM: x86: drop error recovery in em_jmp_far and em_ret_far
  KVM: x86: fix out-of-bounds access in lapic

8 years agoMerge tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 26 Nov 2016 19:24:03 +0000 (11:24 -0800)]
Merge tag 'powerpc-4.9-6' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes marked for stable:
   - Set missing wakeup bit in LPCR on POWER9
   - Fix the early OPAL console wrappers
   - Fixup kernel read only mapping

  Fixes for code merged this cycle:
   - Fix missing CRCs, add more asm-prototypes.h declarations"

* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fixup kernel read only mapping
  powerpc/boot: Fix the early OPAL console wrappers
  powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
  powerpc: Set missing wakeup bit in LPCR on POWER9

8 years agotipc: resolve connection flow control compatibility problem
Jon Paul Maloy [Thu, 24 Nov 2016 23:47:07 +0000 (18:47 -0500)]
tipc: resolve connection flow control compatibility problem

In commit 10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomvpp2: use correct size for memset
Arnd Bergmann [Thu, 24 Nov 2016 16:28:12 +0000 (17:28 +0100)]
mvpp2: use correct size for memset

gcc-7 detects a short memset in mvpp2, introduced in the original
merge of the driver:

drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_cls_init':
drivers/net/ethernet/marvell/mvpp2.c:3296:2: error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]

The result seems to be that we write uninitialized data into the
flow table registers, although we did not get any warning about
that uninitialized data usage.

Using sizeof() lets us initialize then entire array instead.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: drop duplicate header delay.h
Geliang Tang [Thu, 24 Nov 2016 13:58:33 +0000 (21:58 +0800)]
net/mlx5: drop duplicate header delay.h

Drop duplicate header delay.h from mlx5/core/main.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ieee802154: drop duplicate header delay.h
Geliang Tang [Thu, 24 Nov 2016 13:58:32 +0000 (21:58 +0800)]
net: ieee802154: drop duplicate header delay.h

Drop duplicate header delay.h from adf7242.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoibmvnic: drop duplicate header seq_file.h
Geliang Tang [Thu, 24 Nov 2016 13:58:29 +0000 (21:58 +0800)]
ibmvnic: drop duplicate header seq_file.h

Drop duplicate header seq_file.h from ibmvnic.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: fix a leak in tgec_free()
Dan Carpenter [Thu, 24 Nov 2016 11:20:43 +0000 (14:20 +0300)]
fsl/fman: fix a leak in tgec_free()

We set "tgec->cfg" to NULL before passing it to kfree().  There is no
need to set it to NULL at all.  Let's just delete it.

Fixes: 57ba4c9b56d8 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
Miroslav Lichvar [Thu, 24 Nov 2016 09:55:06 +0000 (10:55 +0100)]
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS

The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
command and likewise it shouldn't require the CAP_NET_ADMIN capability.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotipc: improve sanity check for received domain records
Jon Paul Maloy [Thu, 24 Nov 2016 04:46:09 +0000 (23:46 -0500)]
tipc: improve sanity check for received domain records

In commit 35c55c9877f8 ("tipc: add neighbor monitoring framework") we
added a data area to the link monitor STATE messages under the
assumption that previous versions did not use any such data area.

For versions older than Linux 4.3 this assumption is not correct. In
those version, all STATE messages sent out from a node inadvertently
contain a 16 byte data area containing a string; -a leftover from
previous RESET messages which were using this during the setup phase.
This string serves no purpose in STATE messages, and should no be there.

Unfortunately, this data area is delivered to the link monitor
framework, where a sanity check catches that it is not a correct domain
record, and drops it. It also issues a rate limited warning about the
event.

Since such events occur much more frequently than anticipated, we now
choose to remove the warning in order to not fill the kernel log with
useless contents. We also make the sanity check stricter, to further
reduce the risk that such data is inavertently admitted.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotipc: fix compatibility bug in link monitoring
Jon Paul Maloy [Thu, 24 Nov 2016 02:05:26 +0000 (21:05 -0500)]
tipc: fix compatibility bug in link monitoring

commit 817298102b0b ("tipc: fix link priority propagation") introduced a
compatibility problem between TIPC versions newer than Linux 4.6 and
those older than Linux 4.4. In versions later than 4.4, link STATE
messages only contain a non-zero link priority value when the sender
wants the receiver to change its priority. This has the effect that the
receiver resets itself in order to apply the new priority. This works
well, and is consistent with the said commit.

However, in versions older than 4.4 a valid link priority is present in
all sent link STATE messages, leading to cyclic link establishment and
reset on the 4.6+ node.

We fix this by adding a test that the received value should not only
be valid, but also differ from the current value in order to cause the
receiving link endpoint to reset.

Reported-by: Amar Nv <amar.nv005@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
Andrew Lunn [Wed, 23 Nov 2016 23:08:13 +0000 (00:08 +0100)]
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented

The mvneta driver advertises it supports IFF_UNICAST_FLT. However, it
actually does not. The hardware probably does support it, but there is
no code to configure the filter. As a quick and simple fix, remove the
flag. This will cause the core to fall back to promiscuous mode.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: b50b72de2f2f ("net: mvneta: enable features before registering the driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sat, 26 Nov 2016 00:47:15 +0000 (16:47 -0800)]
Merge branch 'parisc-4.9-4' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "On parisc we were still seeing occasional random segmentation faults
  and memory corruption on SMP machines. Dave Anglin then looked again
  at the TLB related code and found two issues in the PCI DMA and
  generic TLB flush functions.

  Then, in our startup code we had some timing of the cache and TLB
  functions to calculate a threshold when to use a complete TLB/cache
  flush or just to flush a specific range. This code produced a race
  with newly started CPUs and thus lead to occasional kernel crashes
  (due to stale TLB/cache entries). The patch by Dave fixes this issue
  by flushing the local caches before starting secondary CPUs and by
  removing the race.

  The last problem fixed by this series is that we quite often suffered
  from hung tasks and self-detected stalls on the CPUs. It was somehow
  clear that this was related to the (in v4.7) newly introduced cr16
  clocksource and the own implementation of sched_clock(). I replaced
  the open-coded sched_clock() function and switched to the generic
  sched_clock() implementation which seems to have fixed this isse as
  well.

  All patches have been sucessfully tested on a variety of machines,
  including our debian buildd servers.

  All patches (beside the small pr_cont fix) are tagged for stable
  releases"

* 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Also flush data TLB in flush_icache_page_asm
  parisc: Fix race in pci-dma.c
  parisc: Switch to generic sched_clock implementation
  parisc: Fix races in parisc_setup_cache_timing()
  parisc: Fix printk continuations in system detection

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Fri, 25 Nov 2016 23:53:45 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull keys fixes from James Morris:
 "From David:

   - Fix mpi_powm()'s handling of a number with a zero exponent
     [CVE-2016-8650].

     Integrate my and Andrey's patches for mpi_powm() and use
     mpi_resize() instead of RESIZE_IF_NEEDED() - the latter adds a
     duplicate check into the execution path of a trivial case we
     don't normally expect to be taken.

   - Fix double free in X.509 error handling"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
  X.509: Fix double free in x509_cert_parse() [ver #3]

8 years agoFix subtle CONFIG_MODVERSIONS problems
Linus Torvalds [Fri, 25 Nov 2016 23:44:47 +0000 (15:44 -0800)]
Fix subtle CONFIG_MODVERSIONS problems

CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series,
and quite frankly, nobody has cared very deeply.  We absolutely know how
to fix it, and it's not _complicated_, but it's not exactly pretty
either.

This oneliner fixes it without the ugliness, and allows for further
future cleanups.

  "We've secretly replaced their regular MODVERSIONS with nothing at
   all, let's see if they notice"

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'acpi-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 25 Nov 2016 23:16:51 +0000 (15:16 -0800)]
Merge tag 'acpi-4.9-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "Two ACPI fixes for 4.9-rc7.

  One of them reverts a recent ACPI commit that attempted to improve
  reboot/power-off on some systems, but introduced problems elsewhere,
  and the other one fixes kernel builds with the new WDAT watchdog
  driver enabled in some configurations.

  Specifics:

   - Revert the recent commit that caused the ACPI _PTS method to be
     executed in the power-off/reboot code path (as per the
     specification) in an attempt to improve things on some systems
     (apparently expecting _PTS to be executed in that code path), but
     broke power-off/reboot on at least one other machine (Rafael
     Wysocki).

   - Fix kernel builds with the new WDAT watchdog driver enabled in some
     configurations by explicitly selecting WATCHDOG_CORE when enabling
     the WDAT watchdog driver (Mika Westerberg)"

* tag 'acpi-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  watchdog: wdat_wdt: Select WATCHDOG_CORE
  Revert "ACPI: Execute _PTS before system reboot"

8 years agoMAINTAINERS: Add bug tracking system location entry type
Rafael J. Wysocki [Thu, 24 Nov 2016 23:13:56 +0000 (00:13 +0100)]
MAINTAINERS: Add bug tracking system location entry type

Following the kernel Bugzilla discussion during the Kernel Summit
(https://lwn.net/Articles/705245/), add bug tracking system location
entry type (B) to MAINTAINERS and populate it for several subsystems
known to be using the kernel BZ actively (and add the upstream BZ for
ACPICA too).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoRevert "i2c: designware: do not disable adapter after transfer"
Jarkko Nikula [Fri, 25 Nov 2016 15:22:27 +0000 (17:22 +0200)]
Revert "i2c: designware: do not disable adapter after transfer"

This reverts commit 0317e6c0f1dc1ba86b8d9dccc010c5e77b8355fa.

Srinivas reported recently touchscreen and touchpad stopped working in
Haswell based machine in Linux 4.9-rc series with timeout errors from
i2c_designware:

[   16.508013] i2c_designware INT33C3:00: controller timed out
[   16.508302] i2c_hid i2c-MSFT0001:02: failed to change power setting.
[   17.532016] i2c_designware INT33C3:00: controller timed out
[   18.556022] i2c_designware INT33C3:00: controller timed out
[   18.556315] i2c_hid i2c-ATML1000:00: failed to retrieve report from device.

I managed to reproduce similar errors on another Haswell based machine
where touchscreen initialization fails maybe in every 1/5 - 1/2 boots.
Since root cause for these errors is not clear yet and debugging is
ongoing it's better to revert this commit as we are near to release.

Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
8 years agoMerge branches 'acpi-sleep-fixes' and 'acpi-wdat-fixes'
Rafael J. Wysocki [Fri, 25 Nov 2016 21:24:07 +0000 (22:24 +0100)]
Merge branches 'acpi-sleep-fixes' and 'acpi-wdat-fixes'

* acpi-sleep-fixes:
  Revert "ACPI: Execute _PTS before system reboot"

* acpi-wdat-fixes:
  watchdog: wdat_wdt: Select WATCHDOG_CORE

8 years agoMerge tag 'linux-can-fixes-for-4.9-20161123' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Fri, 25 Nov 2016 21:17:12 +0000 (16:17 -0500)]
Merge tag 'linux-can-fixes-for-4.9-20161123' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-11-23

this is a pull request for net/master.

The patch by Oliver Hartkopp for the broadcast manager (bcm) fixes the
CAN-FD support, which may cause an out-of-bounds access otherwise.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodwc_eth_qos: drop duplicate headers
Geliang Tang [Wed, 23 Nov 2016 14:24:35 +0000 (22:24 +0800)]
dwc_eth_qos: drop duplicate headers

Drop duplicate headers types.h and delay.h from dwc_eth_qos.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'mfd-fixes-4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Fri, 25 Nov 2016 19:36:35 +0000 (11:36 -0800)]
Merge tag 'mfd-fixes-4.9.1' of git://git./linux/kernel/git/lee/mfd

Pull MFD fixes from Lee Jones:
 "Received a copule of last minute fixes for v4.9.

  The patches from Viresh are fixing issues displayed in KernelCI"

* tag 'mfd-fixes-4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: wm8994-core: Don't use managed regulator bulk get API
  mfd: wm8994-core: Disable regulators before removing them
  mfd: syscon: Support native-endian regmaps

8 years agoMerge tag 'media/v4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Fri, 25 Nov 2016 19:31:01 +0000 (11:31 -0800)]
Merge tag 'media/v4.9-4' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fix from Mauro Carvalho Chehab:
 "Fix for the firmware load logic of the tuner-xc2028 driver"

* tag 'media/v4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  xc2028: Fix use-after-free bug properly

8 years agoMerge tag 'drm-fixes-for-v4.9-rc7' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 25 Nov 2016 18:51:35 +0000 (10:51 -0800)]
Merge tag 'drm-fixes-for-v4.9-rc7' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Seems to be quietening down nicely, a few mediatek, one exynos and one
  hdlcd fix, along with two amd fixes"

* tag 'drm-fixes-for-v4.9-rc7' of git://people.freedesktop.org/~airlied/linux:
  gpu/drm/exynos/exynos_hdmi - Unmap region obtained by of_iomap
  drm/mediatek: fix null pointer dereference
  drm/mediatek: fixed the calc method of data rate per lane
  drm/mediatek: fix a typo of DISP_OD_CFG to OD_RELAYMODE
  drm/radeon: fix power state when port pm is unavailable (v2)
  drm/amdgpu: fix power state when port pm is unavailable
  drm/arm: hdlcd: fix plane base address update
  drm/amd/powerplay: avoid out of bounds access on array ps.

8 years agoparisc: Also flush data TLB in flush_icache_page_asm
John David Anglin [Fri, 25 Nov 2016 01:18:14 +0000 (20:18 -0500)]
parisc: Also flush data TLB in flush_icache_page_asm

This is the second issue I noticed in reviewing the parisc TLB code.

The fic instruction may use either the instruction or data TLB in
flushing the instruction cache.  Thus, on machines with a split TLB, we
should also flush the data TLB after setting up the temporary alias
registers.

Although this has no functional impact, I changed the pdtlb and pitlb
instructions to consistently use the index register %r0.  These
instructions do not support integer displacements.

Tested on rp3440 and c8000.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Helge Deller <deller@gmx.de>