Al Viro [Sun, 4 Dec 2016 17:33:17 +0000 (17:33 +0000)]
ovl: clean up kstat usage
FWIW, there's a bit of abuse of struct kstat in overlayfs object
creation paths - for one thing, it ends up with a very small subset
of struct kstat (mode + rdev), for another it also needs link in
case of symlinks and ends up passing it separately.
IMO it would be better to introduce a separate object for that.
In principle, we might even lift that thing into general API and switch
->mkdir()/->mknod()/->symlink() to identical calling conventions. Hell
knows, perhaps ->create() as well...
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Sat, 12 Nov 2016 19:36:03 +0000 (21:36 +0200)]
ovl: fold ovl_copy_up_truncate() into ovl_copy_up()
This removes code duplication.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Mon, 21 Nov 2016 16:57:34 +0000 (18:57 +0200)]
ovl: create directories inside merged parent opaque
The benefit of making directories opaque on creation is that lookups can
stop short when they reach the original created directory, instead of
continue lookup the entire depth of parent directory stack.
The best case is overlay with N layers, performing lookup for first level
directory, which exists only in upper. In that case, there will be only
one lookup instead of N.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:57 +0000 (11:02 +0100)]
ovl: opaque cleanup
oe->opaque is set for
a) whiteouts
b) directories having the "trusted.overlay.opaque" xattr
Case b can be simplified, since setting the xattr always implies setting
oe->opaque. Also once set, the opaque flag is never cleared.
Don't need to set opaque flag for non-directories.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Tue, 22 Nov 2016 09:47:09 +0000 (11:47 +0200)]
ovl: show redirect_dir mount option
Show the value of redirect_dir in /proc/mounts.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:57 +0000 (11:02 +0100)]
ovl: allow setting max size of redirect
Add a module option to allow tuning the max size of absolute redirects.
Default is 256.
Size of relative redirects is naturally limited by the the underlying
filesystem's max filename length (usually 255).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:57 +0000 (11:02 +0100)]
ovl: allow redirect_dir to default to "on"
This patch introduces a kernel config option and a module param. Both can
be used independently to turn the default value of redirect_dir on or off.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 26 Oct 2016 09:34:06 +0000 (12:34 +0300)]
ovl: check for emptiness of redirect dir
Before introducing redirect_dir feature, the condition
!ovl_lower_positive(dentry) for a directory, implied that it is a pure
upper directory, which may be removed if empty.
Now that directory can be redirect, it is possible that upper does not
cover any lower (i.e. !ovl_lower_positive(dentry)), but the directory is a
merge (with redirected path) and maybe non empty.
Check for this case in ovl_remove_upper().
This change fixes the following test case from rename-pop-dir.py
of unionmount-testsuite:
"""Remove dir and rename old name"""
d = ctx.non_empty_dir()
d2 = ctx.no_dir()
ctx.rmdir(d, err=ENOTEMPTY)
ctx.rename(d, d2)
ctx.rmdir(d, err=ENOENT)
ctx.rmdir(d2, err=ENOTEMPTY)
./run --ov rename-pop-dir
/mnt/a/no_dir103: Expected error (Directory not empty) was not produced
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: redirect on rename-dir
Current code returns EXDEV when a directory would need to be copied up to
move. We could copy up the directory tree in this case, but there's
another, simpler solution: point to old lower directory from moved upper
directory.
This is achieved with a "trusted.overlay.redirect" xattr storing the path
relative to the root of the overlay. After such attribute has been set,
the directory can be moved without further actions required.
This is a backward incompatible feature, old kernels won't be able to
correctly mount an overlay containing redirected directories.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: lookup redirects
If a directory has the "trusted.overlay.redirect" xattr, it means that the
value of the xattr should be used to find the underlying directory on the
next lower layer.
The redirect may be relative or absolute. Absolute redirects begin with a
slash.
A relative redirect means: instead of the current dentry's name use the
value of the redirect to find the directory in the next lower
layer. Relative redirects must not contain a slash.
An absolute redirect means: look up the directory relative to the root of
the overlay using the value of the redirect in the next lower layer.
Redirects work on lower layers as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: consolidate lookup for underlying layers
Use a common helper for lookup of upper and lower layers. This paves the
way for looking up directory redirects.
No functional change.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 16 Nov 2016 09:22:39 +0000 (11:22 +0200)]
ovl: fix nested overlayfs mount
When the upper overlayfs checks "trusted.overlay.*" xattr on the underlying
overlayfs mount, it gets -EPERM, which confuses the upper overlayfs.
Fix this by returning -EOPNOTSUPP instead of -EPERM from
ovl_own_xattr_get() and ovl_own_xattr_set(). This behavior is consistent
with the behavior of ovl_listxattr(), which filters out the private
overlayfs xattrs.
Note: nested overlays are deprecated. But this change makes sense
regardless: these xattrs are private to the overlay and should always be
hidden. Hence getting and setting them should indicate this.
[SzMi: Use EOPNOTSUPP instead of ENODATA and use it for both getting and
setting "trusted.overlay." xattrs. This is a perfectly valid error code
for "we don't support this prefix", which is the case here.]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: check namelen
We already calculate f_namelen in statfs as the maximum of the name lengths
provided by the filesystems taking part in the overlay.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: split super.c
fs/overlayfs/super.c is the biggest of the overlayfs source files and it
contains various utility functions as well as the rather complicated lookup
code. Split these parts out to separate files.
Before:
1446 fs/overlayfs/super.c
After:
919 fs/overlayfs/super.c
267 fs/overlayfs/namei.c
235 fs/overlayfs/util.c
51 fs/overlayfs/ovl_entry.h
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: use d_is_dir()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:56 +0000 (11:02 +0100)]
ovl: simplify lookup
If encountering a non-directory, then stop looking at lower layers.
In this case the oe->opaque flag is not set anymore, which doesn't matter
since existence of lower file is now checked at remove/rename time.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: check lower existence of rename target
Check if something exists on the lower layer(s) under the target or rename
to decide if directory needs to be marked "opaque".
Marking opaque is done before the rename, and on failure the marking was
undone. Also the opaque xattr was removed if the target didn't cover
anything.
This patch changes behavior so that removal of "opaque" is not done in
either of the above cases. This means that directory may have the opaque
flag even if it doesn't cover anything. However this shouldn't affect the
performance or semantics of the overalay, while simplifying the code.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: rename: simplify handling of lower/merged directory
d_is_dir() is safe to call on a negative dentry. Use this fact to simplify
handling of the lower or merged directories.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: get rid of PURE type
The remainging uses of __OVL_PATH_PURE can be replaced by
ovl_dentry_is_opaque().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: check lower existence when removing
Currently ovl_lookup() checks existence of lower file even if there's a
non-directory on upper (which is always opaque). This is done so that
remove can decide whether a whiteout is needed or not.
It would be better to defer this check to unlink, since most of the time
the gathered information about opaqueness will be unused.
This adds a helper ovl_lower_positive() that checks if there's anything on
the lower layer(s).
The following patches also introduce changes to how the "opaque" attribute
is updated on directories: this attribute is added when the directory is
creted or moved over a whiteout or object covering something on the lower
layer. However following changes will allow the attribute to remain on the
directory after being moved, even if the new location doesn't cover
anything. Because of this, we need to check lower layers even for opaque
directories, so that whiteout is only created when necessary.
This function will later be also used to decide about marking a directory
opaque, so deal with negative dentries as well. When dealing with
negative, it's enough to check for being a whiteout
If the dentry is positive but not upper then it also obviously needs
whiteout/opaque.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: add ovl_dentry_is_whiteout()
And use it instead of ovl_dentry_is_opaque() where appropriate.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: don't check sticky
Since commit
07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to
overlay inode") sticky checking on overlay inode is performed by the vfs,
so checking against sticky on underlying inode is not needed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: don't check rename to self
This is redundant, the vfs already performed this check (and was broken,
see commit
9409e22acdfc ("vfs: rename: check backing inode being equal")).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: treat special files like a regular fs
No sense in opening special files on the underlying layers, they work just
as well if opened on the overlay.
Side effect is that it's no longer possible to connect one side of a pipe
opened on overlayfs with the other side opened on the underlying layer.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:55 +0000 (11:02 +0100)]
ovl: rename ovl_rename2() to ovl_rename()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 23 Sep 2016 08:38:12 +0000 (11:38 +0300)]
ovl: use vfs_clone_file_range() for copy up if possible
When copying up within the same fs, try to use vfs_clone_file_range().
This is very efficient when lower and upper are on the same fs
with file reflink support. If vfs_clone_file_range() fails for any
reason, copy up falls back to the regular data copy code.
Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 12 Oct 2016 14:28:11 +0000 (16:28 +0200)]
Revert "ovl: get_write_access() in truncate"
This reverts commit
03bea60409328de54e4ff7ec41672e12a9cb0908.
Commit
4d0c5ba2ff79 ("vfs: do get_write_access() on upper layer of
overlayfs") makes the writecount checks inside overlayfs superfluous, the
file is already copied up and write access acquired on the upper inode when
ovl_setattr is called with ATTR_SIZE.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:54 +0000 (11:02 +0100)]
ovl: update doc
The quirk for file locks and leases no longer applies.
Add missing info about renaming directory residing on lower layer.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Wed, 26 Oct 2016 19:34:01 +0000 (22:34 +0300)]
vfs: fix vfs_clone_file_range() for overlayfs files
With overlayfs, it is wrong to compare file_inode(inode)->i_sb
of regular files with those of non-regular files, because the
former reference the real (upper/lower) sb and the latter reference
the overlayfs sb.
Move the test for same super block after the sanity tests for
clone range of directory and non-regular file.
This change fixes xfstest generic/157, which returned EXDEV instead
of EISDIR/EINVAL in the following test cases over overlayfs:
echo "Try to reflink a dir"
_reflink_range $testdir1/dir1 0 $testdir1/file2 0 $blksz
echo "Try to reflink a device"
_reflink_range $testdir1/dev1 0 $testdir1/file2 0 $blksz
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 23 Sep 2016 08:38:11 +0000 (11:38 +0300)]
vfs: call vfs_clone_file_range() under freeze protection
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the
ioctl handler.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Amir Goldstein [Fri, 23 Sep 2016 08:38:10 +0000 (11:38 +0300)]
vfs: allow vfs_clone_file_range() across mount points
FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
files are not on the same mount point.
Practically, clone only requires that src and dest files
are on the same file system.
Move the check for same mount point to ioctl handler and keep
only the check for same super block in the vfs helper.
A following patch is going to use the vfs_clone_file_range()
helper in overlayfs to copy up between lower and upper
mount points on the same file system.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:54 +0000 (11:02 +0100)]
vfs: no mnt_want_write_file() in vfs_{copy,clone}_file_range()
We've checked for file_out being opened for write. This ensures that we
already have mnt_want_write() on target.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:54 +0000 (11:02 +0100)]
Revert "vfs: rename: check backing inode being equal"
This reverts commit
9409e22acdfc9153f88d9b1ed2bd2a5b34d2d3ca.
Since commit
51f7e52dc943 ("ovl: share inode for hard link") there's no
need to call d_real_inode() to check two overlay inodes for equality.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 16 Dec 2016 10:02:53 +0000 (11:02 +0100)]
Revert "af_unix: fix hard linked sockets on overlay"
This reverts commit
eb0a4a47ae89aaa0674ab3180de6a162f3be2ddf.
Since commit
51f7e52dc943 ("ovl: share inode for hard link") there's no
need to call d_real_inode() to check two overlay inodes for equality.
Side effect of this revert is that it's no longer possible to connect one
socket on overlayfs to one on the underlying layer (something which didn't
make sense anyway).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Linus Torvalds [Sun, 4 Dec 2016 20:50:51 +0000 (12:50 -0800)]
Linux 4.9-rc8
Linus Torvalds [Sun, 4 Dec 2016 00:40:21 +0000 (16:40 -0800)]
Merge tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A pretty small pull request: a couple of AMD powerxpress regression
fixes and a power management fix, a couple of i915 fixes and one hdlcd
fix, along with one core don't oops because of incorrect API usage fix"
* tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/i915: drop the struct_mutex when wedged or trying to reset
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
drm: Don't call drm_for_each_crtc with a non-KMS driver
drm/radeon: fix check for port PM availability
drm/amdgpu: fix check for port PM availability
drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr
drm: hdlcd: Fix cleanup order
Dave Airlie [Sat, 3 Dec 2016 20:31:26 +0000 (06:31 +1000)]
Merge tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
2 intel fixes.
* tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: drop the struct_mutex when wedged or trying to reset
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
Linus Torvalds [Sat, 3 Dec 2016 02:48:11 +0000 (18:48 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more fixes from Andrew Morton:
"2 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, vmscan: add cond_resched() into shrink_node_memcg()
mm: workingset: fix NULL ptr in count_shadow_nodes
Michal Hocko [Sat, 3 Dec 2016 01:26:48 +0000 (17:26 -0800)]
mm, vmscan: add cond_resched() into shrink_node_memcg()
Boris Zhmurov has reported RCU stalls during the kswapd reclaim:
INFO: rcu_sched detected stalls on CPUs/tasks:
23-...: (22 ticks this GP) idle=92f/
140000000000000/0 softirq=
2638404/
2638404 fqs=23
(detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
Task dump for CPU 23:
kswapd1 R running task 0 148 2 0x00000008
Call Trace:
shrink_node+0xd2/0x2f0
kswapd+0x2cb/0x6a0
mem_cgroup_shrink_node+0x160/0x160
kthread+0xbd/0xe0
__switch_to+0x1fa/0x5c0
ret_from_fork+0x1f/0x40
kthread_create_on_node+0x180/0x180
a closer code inspection has shown that we might indeed miss all the
scheduling points in the reclaim path if no pages can be isolated from
the LRU list. This is a pathological case but other reports from Donald
Buczek have shown that we might indeed hit such a path:
clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239830 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239844 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239858 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239872 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239886 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239900 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239914 nr_taken=0 file=1
[...]
kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4241133 nr_taken=0 file=1
this is minute long snapshot which didn't take a single page from the
LRU. It is not entirely clear why only 1303 pages have been scanned
during that time (maybe there was a heavy IRQ activity interfering).
In any case it looks like we can really hit long periods without
scheduling on non preemptive kernels so an explicit cond_resched() in
shrink_node_memcg which is independent on the reclaim operation is due.
Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Boris Zhmurov <bb@kernelpanic.ru>
Tested-by: Boris Zhmurov <bb@kernelpanic.ru>
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Reported-by: "Christopher S. Aker" <caker@theshore.net>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sat, 3 Dec 2016 01:26:45 +0000 (17:26 -0800)]
mm: workingset: fix NULL ptr in count_shadow_nodes
Commit
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg
aware") has made the workingset shadow nodes shrinker memcg aware. The
implementation is not correct though because memcg_kmem_enabled() might
become true while we are doing a global reclaim when the sc->memcg might
be NULL which is exactly what Marek has seen:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000400
IP: [<
ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
PGD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 60 Comm: kswapd0 Tainted: G O 4.8.10-12.pvops.qubes.x86_64 #1
task:
ffff880011863b00 task.stack:
ffff880011868000
RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP: e02b:
ffff88001186bc70 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
ffff88001186bd20 RCX:
0000000000000002
RDX:
000000000000000c RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffff88001186bc70 R08:
28f5c28f5c28f5c3 R09:
0000000000000000
R10:
0000000000006c34 R11:
0000000000000333 R12:
00000000000001f6
R13:
ffffffff81c6f6a0 R14:
0000000000000000 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff880013c00000(0000) knlGS:
ffff880013d00000
CS: e033 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000400 CR3:
00000000122f2000 CR4:
0000000000042660
Call Trace:
count_shadow_nodes+0x9a/0xa0
shrink_slab.part.42+0x119/0x3e0
shrink_node+0x22c/0x320
kswapd+0x32c/0x700
kthread+0xd8/0xf0
ret_from_fork+0x1f/0x40
Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1
RIP mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP <
ffff88001186bc70>
CR2:
0000000000000400
---[ end trace
100494b9edbdfc4d ]---
This patch fixes the issue by checking sc->memcg rather than
memcg_kmem_enabled() which is sufficient because shrink_slab makes sure
that only memcg aware shrinkers will get non-NULL memcgs and only if
memcg_kmem_enabled is true.
Fixes:
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Tested-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Pitre [Fri, 2 Dec 2016 20:11:50 +0000 (15:11 -0500)]
kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabled
When building a specific target such as bzImage, modules aren't normally
built. However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules
means none of the exported symbols are used and therefore they will all
be trimmed away from the final kernel. A subsequent "make modules" will
fail because modpost cannot find the needed symbols for those modules in
the kernel binary.
Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS
is enabled and that the kernel binary is properly rebuilt accordingly.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 2 Dec 2016 21:34:37 +0000 (13:34 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This should be the last set of bugfixes for arm-soc in v4.9. None of
these are critical regressions, but it would be nice to still get them
merged.
- On the Juno platform, the idle latency was described wrong, leading
to suboptimal cpuidle tuning.
- Also on the same platform, PCI I/O space was set up incorrectly and
could not work.
- On the sti platform, a syntactically incorrect DT entry caused
warnings.
- The newly added 'gr8' platform has somewhat confusing file names,
which we rename for consistency"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
arm64: dts: juno: Correct PCI IO window
ARM: dts: STiH407-family: fix i2c nodes
ARM: gr8: Rename the DTSI and relevant DTS
Linus Torvalds [Fri, 2 Dec 2016 19:45:27 +0000 (11:45 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Lots more phydev and probe error path leaks in various drivers by
Johan Hovold.
2) Fix race in packet_set_ring(), from Philip Pettersson.
3) Use after free in dccp_invalid_packet(), from Eric Dumazet.
4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
Dumazet.
5) When tunneling between ipv4 and ipv6 we can be left with the wrong
skb->protocol value as we enter the IPSEC engine and this causes all
kinds of problems. Set it before the output path does any
dst_output() calls, from Eli Cooper.
6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
Florian Fainelli.
7) Various netfilter nat bug fixes from FLorian Westphal.
8) Fix memory leak in ipvlan_link_new(), from Gao Feng.
9) Locking fixes, particularly wrt. socket lookups, in l2tp from
Guillaume Nault.
10) Avoid invoking rhash teardowns in atomic context by moving netlink
cb->done() dump completion from a worker thread. Fix from Herbert
Xu.
11) Buffer refcount problems in tun and macvtap on errors, from Jason
Wang.
12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
selects BBR. Fix from Julian Wollrath.
13) Fix deadlock in transmit path on altera TSE driver, from Lino
Sanfilippo.
14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
Yushchenko.
15) tc_tunnel_key needs to be properly exported to userspace via uapi,
fix from Roi Dayan.
16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
from Sowmini Varadhan.
17) Stale packet header pointer access after pskb_expand_head() in
genenve driver, fix from Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
tipc: check minimum bearer MTU
net: renesas: ravb: unintialized return value
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
cdc_ether: Fix handling connection notification
ip6_offload: check segs for NULL in ipv6_gso_segment.
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
ipv6: Set skb->protocol properly for local output
ipv4: Set skb->protocol properly for local output
packet: fix race condition in packet_set_ring
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
net: ethernet: stmmac: platform: fix outdated function header
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
net: ethernet: stmmac: dwmac-generic: fix probe error path
...
Eric Dumazet [Fri, 2 Dec 2016 17:44:53 +0000 (09:44 -0800)]
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...
Note that before commit
82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.
This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 2 Dec 2016 15:49:29 +0000 (16:49 +0100)]
geneve: avoid use-after-free of skb->data
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.
Fixes:
08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Fri, 2 Dec 2016 08:33:41 +0000 (09:33 +0100)]
tipc: check minimum bearer MTU
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes:
b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 19:02:13 +0000 (14:02 -0500)]
Merge tag 'linux-can-fixes-for-4.9-
20161201' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-12-02
this is a pull request for net/master.
There are two patches by Stephane Grosjean, who adds support for the new
PCAN-USB X6 USB interface to the pcan_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 1 Dec 2016 20:57:44 +0000 (23:57 +0300)]
net: renesas: ravb: unintialized return value
We want to set the other "err" variable here so that we can return it
later. My version of GCC misses this issue but I caught it with a
static checker.
Fixes:
9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Brandt [Thu, 1 Dec 2016 18:32:14 +0000 (13:32 -0500)]
sh_eth: remove unchecked interrupts for RZ/A1
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.
irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<
c036b71c>] sh_eth_interrupt
Disabling IRQ #21
Fixes:
db893473d313a4ad ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 1 Dec 2016 17:45:45 +0000 (09:45 -0800)]
net: bcmgenet: Utilize correct struct device for all DMA operations
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.
Fixes:
1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 18:48:50 +0000 (10:48 -0800)]
Fix up a couple of field names in the CREDITS file
Ozgur Karatas reported that the very first entry in the CREDITS file had
the wrong tag for name (M: instead of N: - it happened when moving the
entry from the MAINTAINERS file, where 'M:' stands for "Maintainer").
And when I went looking, I found a couple of other cases of wrong
tagging too.
Reported-by: Ozgur Karatas <mueddib@yandex.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniele Palmas [Thu, 1 Dec 2016 15:52:05 +0000 (16:52 +0100)]
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
This patch adds support for PID 0x1040 of Telit LE922A.
The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kristian Evensen [Thu, 1 Dec 2016 13:23:17 +0000 (14:23 +0100)]
cdc_ether: Fix handling connection notification
Commit
bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).
Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After
bfe9b9d2df66, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:
cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before
bfe9b9d2df66. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.
v1->v2:
* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).
Fixes:
bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Artem Savkov [Thu, 1 Dec 2016 13:06:04 +0000 (14:06 +0100)]
ip6_offload: check segs for NULL in ipv6_gso_segment.
segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
[ 97.811262] BUG: unable to handle kernel NULL pointer dereference at
00000000000000cc
[ 97.819112] IP: [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.825214] PGD 0 [ 97.827047]
[ 97.828540] Oops: 0000 [#1] SMP
[ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[ 97.941806] task:
ffff880129a1c080 task.stack:
ffffc90001bcc000
[ 97.947720] RIP: 0010:[<
ffffffff816e52f9>] [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.956251] RSP: 0018:
ffff88012fc43a10 EFLAGS:
00010207
[ 97.961557] RAX:
0000000000000000 RBX:
ffff8801292c8700 RCX:
0000000000000594
[ 97.968687] RDX:
0000000000000593 RSI:
ffff880129a846c0 RDI:
0000000000240000
[ 97.975814] RBP:
ffff88012fc43a68 R08:
ffff880129a8404e R09:
0000000000000000
[ 97.982942] R10:
0000000000000000 R11:
ffff880129a84076 R12:
00000020002949b3
[ 97.990070] R13:
ffff88012a580000 R14:
0000000000000000 R15:
ffff88012a580000
[ 97.997198] FS:
0000000000000000(0000) GS:
ffff88012fc40000(0000) knlGS:
0000000000000000
[ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 98.011021] CR2:
00000000000000cc CR3:
0000000126c5d000 CR4:
00000000000006e0
[ 98.018149] Stack:
[ 98.020157]
00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[ 98.027584]
0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[ 98.035010]
ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[ 98.042437] Call Trace:
[ 98.044879] <IRQ> [ 98.046803] [<
ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[ 98.053156] [<
ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
[ 98.059158] [<
ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
[ 98.064985] [<
ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
[ 98.070899] [<
ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
[ 98.077073] [<
ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
[ 98.082726] [<
ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
[ 98.088554] [<
ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
[ 98.094380] [<
ffffffff815effa0>] dev_queue_xmit+0x10/0x20
[ 98.099863] [<
ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[ 98.106907] [<
ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
[ 98.113430] [<
ffffffff81627cf2>] ? nf_iterate+0x52/0x60
[ 98.118735] [<
ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
[ 98.124216] [<
ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
[ 98.130480] [<
ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 98.137785] [<
ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
[ 98.143701] [<
ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
[ 98.150834] [<
ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
[ 98.157355] [<
ffffffff8102fb89>] ? sched_clock+0x9/0x10
[ 98.162662] [<
ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
[ 98.168403] [<
ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
[ 98.174926] [<
ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
[ 98.180580] [<
ffffffff815ed548>] __netif_receive_skb+0x18/0x60
[ 98.186494] [<
ffffffff815ee625>] process_backlog+0x95/0x140
[ 98.192145] [<
ffffffff815edccd>] net_rx_action+0x16d/0x380
[ 98.197713] [<
ffffffff8170cff1>] __do_softirq+0xd1/0x283
[ 98.203106] [<
ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
[ 98.209107] <EOI> [ 98.211029] [<
ffffffff8108a5c0>] do_softirq+0x50/0x60
[ 98.216166] [<
ffffffff815ec853>] netif_rx_ni+0x33/0x80
[ 98.221386] [<
ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
[ 98.227388] [<
ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
[ 98.233129] [<
ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
[ 98.239392] [<
ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
[ 98.245916] [<
ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
[ 98.251919] [<
ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
[ 98.258440] [<
ffffffff81003a47>] ? do_syscall_64+0x67/0x180
[ 98.264094] [<
ffffffff810a44d9>] kthread+0xd9/0xf0
[ 98.268965] [<
ffffffff810a4400>] ? kthread_park+0x60/0x60
[ 98.274444] [<
ffffffff8170a4d5>] ret_from_fork+0x25/0x30
[ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[ 98.299425] RIP [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 98.305612] RSP <
ffff88012fc43a10>
[ 98.309094] CR2:
00000000000000cc
[ 98.312406] ---[ end trace
726a2c7a2d2d78d0 ]---
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 1 Dec 2016 12:44:43 +0000 (04:44 -0800)]
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
If some error is encountered in rds_tcp_init_net, make sure to
unregister_netdevice_notifier(), else we could trigger a panic
later on, when the modprobe from a netns fails.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:12 +0000 (10:05 +0800)]
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
This reverts commit
ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:11 +0000 (10:05 +0800)]
ipv6: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:
xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
where skb_gso_segment() relies on skb->protocol to function properly.
This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:10 +0000 (10:05 +0800)]
ipv4: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:
xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
where skb_gso_segment() relies on skb->protocol to function properly.
This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Pettersson [Wed, 30 Nov 2016 22:55:36 +0000 (14:55 -0800)]
packet: fix race condition in packet_set_ring
When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.
This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.
The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.
Fixes:
f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 17:15:26 +0000 (09:15 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"All architectures avoid memory corruption in an error path. ARM
prevents bogus acknowledgement of interrupts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: use after free in kvm_ioctl_create_device()
KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
Linus Torvalds [Fri, 2 Dec 2016 17:12:44 +0000 (09:12 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Here is the revert for the regression of the i2c-octeon driver I
mentioned last time. I wished for a bit more feedback, but all people
working actively on it are in need of this patch, so here it goes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: octeon: thunderx: Limit register access retries"
Lino Sanfilippo [Wed, 30 Nov 2016 22:48:32 +0000 (23:48 +0100)]
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
The driver already uses its private lock for synchronization between xmit
and xmit completion handler making the additional use of the xmit_lock
unnecessary.
Furthermore the driver does not set NETIF_F_LLTX resulting in xmit to be
called with the xmit_lock held and then taking the private lock while xmit
completion handler does the reverse, first take the private lock, then the
xmit_lock.
Fix these issues by not taking the xmit_lock in the tx completion handler.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lino Sanfilippo [Wed, 30 Nov 2016 22:48:31 +0000 (23:48 +0100)]
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
An explicit dma sync for device directly after mapping as well as an
explicit dma sync for cpu directly before unmapping is unnecessary and
costly on the hotpath. So remove these calls.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 2 Dec 2016 12:40:27 +0000 (13:40 +0100)]
default exported asm symbols to zero
With binutils-2.26 and before, a weak missing symbol was kept during the
final link, and a missing CRC for an export would lead to that CRC being
treated as zero implicitly. With binutils-2.27, the crc symbol gets
dropped, and any module trying to use it will fail to load.
This sets the weak CRC symbol to zero explicitly, making it defined in
vmlinux, which in turn lets us load the modules referring to that CRC.
The comment above the __CRC_SYMBOL macro suggests that this was always
the intention, although it also seems that all symbols defined in C have
a correct CRC these days, and only the exports that are now done in
assembly need this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudeep Holla [Wed, 16 Nov 2016 17:31:31 +0000 (17:31 +0000)]
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
The core and the cluster sleep state entry latencies can't be same as
cluster sleep involves more work compared to core level e.g. shared
cache maintenance.
Experiments have shown on an average about 100us more latency for the
cluster sleep state compared to the core level sleep. This patch fixes
the entry latency for the cluster sleep state.
Fixes:
28e10a8f3a03 ("arm64: dts: juno: Add idle-states to device tree")
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
David S. Miller [Fri, 2 Dec 2016 15:42:47 +0000 (10:42 -0500)]
Merge branch 'stmmac-probe-error-handling-and-phydev-leaks'
Johan Hovold says:
====================
net: stmmac: fix probe error handling and phydev leaks
This series fixes a number of issues with the stmmac-driver probe error
handling, which for example left clocks enabled after probe failures.
The final patch fixes a failure to deregister and free any fixed-link
PHYs that were registered during probe on probe errors and on driver
unbind. It also fixes a related of-node leak on late probe errors.
This series depends on the of_phy_deregister_fixed_link() helper that
was just merged to net.
As mentioned earlier, one staging driver also suffers from a similar
leak and can be fixed up once the above mentioned helper hits mainline.
Note that these patches have only been compile tested.
====================
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:55 +0000 (15:29 +0100)]
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
Make sure to deregister and free any fixed-link phy registered during
probe on probe errors and on driver unbind by adding a new glue helper
function.
Drop the of-node reference taken in the same path also on late probe
errors (and not just on driver unbind) by moving the put from
stmmac_dvr_remove() to the new helper.
Fixes:
277323814e49 ("stmmac: add fixed-link device-tree support")
Fixes:
4613b279bee7 ("ethernet: stmicro: stmmac: add missing of_node_put
after calling of_parse_phandle")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:54 +0000 (15:29 +0100)]
net: ethernet: stmmac: platform: fix outdated function header
Fix the OF-helper function header to reflect that the function no longer
has a platform-data parameter.
Fixes:
b0003ead75f3 ("stmmac: make stmmac_probe_config_dt return the
platform data struct")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:53 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
Make sure to disable clocks before returning on late probe errors.
Fixes:
566e82516253 ("net: stmmac: add a glue driver for the Amlogic
Meson 8b / GXBB DWMAC")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:52 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-generic: fix probe error path
Make sure to call any exit() callback to undo the effect of init()
before returning on late probe errors.
Fixes:
cf3f047b9af4 ("stmmac: move hw init in the probe (v2)")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:51 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-rk: fix probe error path
Make sure to disable runtime PM, power down the PHY, and disable clocks
before returning on late probe errors.
Fixes:
27ffefd2d109 ("stmmac: dwmac-rk: create a new probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:50 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-sti: fix probe error path
Make sure to disable clocks before returning on late probe errors.
Fixes:
8387ee21f972 ("stmmac: dwmac-sti: turn setup callback into a
probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:49 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe errors
Make sure to call stmmac_dvr_remove() before returning on late probe
errors so that memory is freed, clocks are disabled, and the netdev is
deregistered before its resources go away.
Fixes:
3c201b5a84ed ("net: stmmac: socfpga: Remove re-registration of
reset controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 30 Nov 2016 13:30:37 +0000 (14:30 +0100)]
net/rtnetlink: fix attribute name in nlmsg_size() comments
Use the correct attribute constant names IFLA_GSO_MAX_{SEGS,SIZE}
instead of IFLA_MAX_GSO_{SEGS,SIZE} for the comments int nlmsg_size().
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 00:44:42 +0000 (16:44 -0800)]
Merge tag 'pci-v4.9-fixes-4' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"PCI fixes:
- Fix Read Completion Boundary setting, which fixes a boot failure on
IBM x3850 with Mellanox MT27500 ConnectX-3
- Update some MAINTAINERS entries and email addresses"
* tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
PCI: Export pcie_find_root_port
PCI: designware-plat: Update author email
PCI: designware: Change maintainer to Joao Pinto
MAINTAINERS: Add devicetree binding to PCI i.MX6 entry
MAINTAINERS: Update Richard Zhu's email address
Alexander Duyck [Mon, 28 Nov 2016 15:42:29 +0000 (10:42 -0500)]
ixgbe/ixgbevf: Don't use lco_csum to compute IPv4 checksum
In the case of IPIP and SIT tunnel frames the outer transport header
offset is actually set to the same offset as the inner transport header.
This results in the lco_csum call not doing any checksum computation over
the inner IPv4/v6 header data.
In order to account for that I am updating the code so that we determine
the location to start the checksum ourselves based on the location of the
IPv4 header and the length.
Fixes:
b83e30104bd9 ("ixgbe/ixgbevf: Add support for GSO partial")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 28 Nov 2016 15:42:23 +0000 (10:42 -0500)]
igb/igbvf: Don't use lco_csum to compute IPv4 checksum
In the case of IPIP and SIT tunnel frames the outer transport header
offset is actually set to the same offset as the inner transport header.
This results in the lco_csum call not doing any checksum computation over
the inner IPv4/v6 header data.
In order to account for that I am updating the code so that we determine
the location to start the checksum ourselves based on the location of the
IPv4 header and the length.
Fixes:
e10715d3e961 ("igb/igbvf: Add support for GSO partial")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
allan [Wed, 30 Nov 2016 08:29:08 +0000 (16:29 +0800)]
net: asix: Fix AX88772_suspend() USB vendor commands failure issues
The change fixes AX88772_suspend() USB vendor commands failure issues.
Signed-off-by: Allan Chou <allan@asix.com.tw>
Tested-by: Allan Chou <allan@asix.com.tw>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 1 Dec 2016 18:31:53 +0000 (10:31 -0800)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi:
"This fixes a regression introduced in 4.8"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix d_real() for stacked fs
Linus Torvalds [Thu, 1 Dec 2016 18:29:41 +0000 (10:29 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov: "We are disabling automatic
probing of BYD touchpads as it results in too many false positives,
and the hardware is not terribly popular and having the protocol
support does not result in significantly improved user experience.
We also change keycode for KEY_DATA to avoid clashing with
KEY_FASTREVERSE. Luckily this newish code is used by CEC framework
that is still in staging, so it is extremely unlikely that someone has
already started using this keycode"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: change KEY_DATA from 0x275 to 0x277
Input: psmouse - disable automatic probing of BYD touchpads
Nicolas Pitre [Wed, 30 Nov 2016 22:41:58 +0000 (17:41 -0500)]
kbuild: make sure autoksyms.h exists early
Some people are able to trigger a race where autoksyms.h is used before
its empty version is even created. Let's create it at the same time as
the directory holding it is created.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Thu, 1 Dec 2016 16:35:49 +0000 (11:35 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2016-12-01
1) Change the error value when someone tries to run 32bit
userspace on a 64bit host from -ENOTSUPP to the userspace
exported -EOPNOTSUPP. Fix from Yi Zhao.
2) On inbound, ESN sequence numbers are already in network
byte order. So don't try to convert it again, this fixes
integrity verification for ESN. Fixes from Tobias Brunner.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Dec 2016 16:04:41 +0000 (11:04 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is a large batch of Netfilter fixes for net, they are:
1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist
structure that allows to have several objects with the same key.
Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is
expecting a return value similar to memcmp(). Change location of
the nat_bysource field in the nf_conn structure to avoid zeroing
this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us
to crashes. From Florian Westphal.
2) Don't allow malformed fragments go through in IPv6, drop them,
otherwise we hit GPF, patch from Florian Westphal.
3) Fix crash if attributes are missing in nft_range, from Liping Zhang.
4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia.
5) Two patches from David Ahern to fix netfilter interaction with vrf.
From David Ahern.
6) Fix element timeout calculation in nf_tables, we take milliseconds
from userspace, but we use jiffies from kernelspace. Patch from
Anders K. Pedersen.
7) Missing validation length netlink attribute for nft_hash, from
Laura Garcia.
8) Fix nf_conntrack_helper documentation, we don't default to off
anymore for a bit of time so let's get this in sync with the code.
I know is late but I think these are important, specifically the NAT
bits, as they are mostly addressing fallout from recent changes. I also
read there are chances to have -rc8, if that is the case, that would
also give us a bit more time to test this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 30 Nov 2016 19:21:05 +0000 (22:21 +0300)]
KVM: use after free in kvm_ioctl_create_device()
We should move the ops->destroy(dev) after the list_del(&dev->vm_node)
so that we don't use "dev" after freeing it.
Fixes:
a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Thu, 1 Dec 2016 13:56:34 +0000 (14:56 +0100)]
Merge tag 'kvm-arm-for-4.9-rc7' of git://git./linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc7
- Do not call kvm_notify_acked for PPIs
Stephane Grosjean [Thu, 1 Dec 2016 10:41:12 +0000 (11:41 +0100)]
can: peak: Add support for PCAN-USB X6 USB interface
This adds support for PEAK-System PCAN-USB X6 USB to CAN interface.
The CAN FD adapter PCAN-USB X6 allows the connection of up to 6 CAN FD
or CAN networks to a computer via USB. The interface is installed in an
aluminum profile casing and is shipped in versions with D-Sub connectors
or M12 circular connectors.
The PCAN-USB X6 registers in the USB sub-system as if 3x PCAN-USB-Pro FD
adapters were plugged. So, this patch:
- updates the PEAK_USB entry of the corresponding Kconfig file
- defines and adds the device id. of the PCAN-USB X6 (0x0014) into the
table of supported device ids
- defines and adds the new software structure implementing the PCAN-USB X6,
which is obviously a clone of the software structure implementing the
PCAN-USB Pro FD.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Stephane Grosjean [Thu, 1 Dec 2016 10:41:11 +0000 (11:41 +0100)]
can: peak: Fix bittiming fields size in bits
This fixes the bitimings fields ranges supported by all the CAN-FD USB
interfaces of the PEAK-System CAN-FD adapters.
Very first development versions of the IP core API defined smaller TSGEx
and SJW fields for both nominal and data bittimings records than the
production versions. This patch fixes them by enlarging their sizes to
the actual values:
field: old size: fixed size:
nominal TSGEG1 6 8
nominal TSGEG2 4 7
nominal SJW 4 7
data TSGEG1 4 5
data TSGEG2 3 4
data SJW 2 4
Note that this has no other consequences than offering larger choice to
bitrate encoding.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Linus Torvalds [Thu, 1 Dec 2016 00:33:41 +0000 (16:33 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
kasan: support use-after-scope detection
kasan: update kasan_global for gcc 7
lib/debugobjects: export for use in modules
zram: fix unbalanced idr management at hot removal
thp: fix corner case of munlock() of PTE-mapped THPs
mm, thp: propagation of conditional compilation in khugepaged.c
Kirill A. Shutemov [Wed, 30 Nov 2016 23:54:19 +0000 (15:54 -0800)]
mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
Hugetlb pages have ->index in size of the huge pages (PMD_SIZE or
PUD_SIZE), not in PAGE_SIZE as other types of pages. This means we
cannot user page_to_pgoff() to check whether we've got the right page
for the radix-tree index.
Let's introduce page_to_index() which would return radix-tree index for
given page.
We will be able to get rid of this once hugetlb will be switched to
multi-order entries.
Fixes:
fc127da085c2 ("truncate: handle file thp")
Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Tested-by: Doug Nelson <doug.nelson@intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Wed, 30 Nov 2016 23:54:16 +0000 (15:54 -0800)]
kasan: support use-after-scope detection
Gcc revision 241896 implements use-after-scope detection. Will be
available in gcc 7. Support it in KASAN.
Gcc emits 2 new callbacks to poison/unpoison large stack objects when
they go in/out of scope. Implement the callbacks and add a test.
[dvyukov@google.com: v3]
Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com
Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Wed, 30 Nov 2016 23:54:13 +0000 (15:54 -0800)]
kasan: update kasan_global for gcc 7
kasan_global struct is part of compiler/runtime ABI. gcc revision
241983 has added a new field to kasan_global struct. Update kernel
definition of kasan_global struct to include the new field.
Without this patch KASAN is broken with gcc 7.
Link: http://lkml.kernel.org/r/1479219743-28682-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Wilson [Wed, 30 Nov 2016 23:54:10 +0000 (15:54 -0800)]
lib/debugobjects: export for use in modules
Drivers, or other modules, that use a mixture of objects (especially
objects embedded within other objects) would like to take advantage of
the debugobjects facilities to help catch misuse. Currently, the
debugobjects interface is only available to builtin drivers and requires
a set of EXPORT_SYMBOL_GPL for use by modules.
I am using the debugobjects in i915.ko to try and catch some invalid
operations on embedded objects. The problem currently only presents
itself across module unload so forcing i915 to be builtin is not an
option.
Link: http://lkml.kernel.org/r/20161122143039.6433-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Du, Changbin" <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Wed, 30 Nov 2016 23:54:08 +0000 (15:54 -0800)]
zram: fix unbalanced idr management at hot removal
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY). This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.
As described in the bug report below, the following procedure would
cause an Oops with zram:
- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
+ echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
+ echo 2 > /sys/class/zram-control/hot_remove
+ echo 1 > /sys/class/zram-control/hot_remove
+ echo 0 > /sys/class/zram-control/hot_remove
- zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
+ echo 0 > /sys/class/zram-control/hot_remove
- fails with ENODEV (unexpected)
- unload zram kernel module
+ completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
+ mount command is killed
+ following BUG is encountered
BUG: unable to handle kernel paging request at
ffffffffa0002ba0
IP: get_disk+0x16/0x50
Oops: 0000 [#1] SMP
CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
Call Trace:
exact_lock+0xc/0x20
kobj_lookup+0xdc/0x160
get_gendisk+0x2f/0x110
__blkdev_get+0x10c/0x3c0
blkdev_get+0x19d/0x2e0
blkdev_open+0x56/0x70
do_dentry_open.isra.19+0x1ff/0x310
vfs_open+0x43/0x60
path_openat+0x2c9/0xf30
do_filp_open+0x79/0xd0
do_sys_open+0x114/0x1e0
SyS_open+0x19/0x20
entry_SYSCALL_64_fastpath+0x13/0x94
This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.
Fixes:
17ec4cd98578 ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=
1010970
Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reported-by: David Disseldorp <ddiss@suse.de>
Tested-by: David Disseldorp <ddiss@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 30 Nov 2016 23:54:05 +0000 (15:54 -0800)]
thp: fix corner case of munlock() of PTE-mapped THPs
The following program triggers BUG() in munlock_vma_pages_range():
// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <sys/mman.h>
int main()
{
mmap((void*)0x20105000ul, 0xc00000ul, 0x2ul, 0x2172ul, -1, 0);
mremap((void*)0x201fd000ul, 0x4000ul, 0xc00000ul, 0x3ul, 0x203f0000ul);
return 0;
}
The test-case constructs the situation when munlock_vma_pages_range()
finds PTE-mapped THP-head in the middle of page table and, by mistake,
skips HPAGE_PMD_NR pages after that.
As result, on the next iteration it hits the middle of PMD-mapped THP
and gets upset seeing mlocked tail page.
The solution is only skip HPAGE_PMD_NR pages if the THP was mlocked
during munlock_vma_page(). It would guarantee that the page is
PMD-mapped as we never mlock PTE-mapeed THPs.
Fixes:
e90309c9f772 ("thp: allow mlocked THP again")
Link: http://lkml.kernel.org/r/20161115132703.7s7rrgmwttegcdh4@black.fi.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jérémy Lefaure [Wed, 30 Nov 2016 23:54:02 +0000 (15:54 -0800)]
mm, thp: propagation of conditional compilation in khugepaged.c
Commit
b46e756f5e47 ("thp: extract khugepaged from mm/huge_memory.c")
moved code from huge_memory.c to khugepaged.c. Some of this code should
be compiled only when CONFIG_SYSFS is enabled but the condition around
this code was not moved into khugepaged.c.
The result is a compilation error when CONFIG_SYSFS is disabled:
mm/built-in.o: In function `khugepaged_defrag_store': khugepaged.c:(.text+0x2d095): undefined reference to `single_hugepage_flag_store'
mm/built-in.o: In function `khugepaged_defrag_show': khugepaged.c:(.text+0x2d0ab): undefined reference to `single_hugepage_flag_show'
This commit adds the #ifdef CONFIG_SYSFS around the code related to
sysfs.
Link: http://lkml.kernel.org/r/20161114203448.24197-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Thu, 1 Dec 2016 00:00:14 +0000 (10:00 +1000)]
Merge tag 'drm-misc-fixes-2016-11-30' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
single drm fix.
* tag 'drm-misc-fixes-2016-11-30' of git://anongit.freedesktop.org/git/drm-misc:
drm: Don't call drm_for_each_crtc with a non-KMS driver
Linus Torvalds [Wed, 30 Nov 2016 23:15:49 +0000 (15:15 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two small fixes for MIPI PLLs on sunxi devices and a build fix for a
Broadcom clk driver having unmet dependencies"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: bcm: Fix unmet Kconfig dependencies for CLK_BCM_63XX
clk: sunxi-ng: enable so-said LDOs for A33 SoC's pll-mipi clock
clk: sunxi-ng: sun6i-a31: Enable PLL-MIPI LDOs when ungating it
Jeremy Linton [Tue, 29 Nov 2016 20:45:10 +0000 (14:45 -0600)]
arm64: dts: juno: Correct PCI IO window
The PCIe root complex on Juno translates the MMIO mapped
at 0x5f800000 to the PIO address range starting at 0
(which is common because PIO addresses are generally < 64k).
Correct the DT to reflect this.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Jason Wang [Wed, 30 Nov 2016 05:17:52 +0000 (13:17 +0800)]
macvtap: handle ubuf refcount correctly when meet errors
We trigger uarg->callback() immediately after we decide do datacopy
even if caller want to do zerocopy. This will cause the callback
(vhost_net_zerocopy_callback) decrease the refcount. But when we meet
an error afterwards, the error handling in vhost handle_tx() will try
to decrease it again. This is wrong and fix this by delay the
uarg->callback() until we're sure there's no errors.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 30 Nov 2016 05:17:51 +0000 (13:17 +0800)]
tun: handle ubuf refcount correctly when meet errors
We trigger uarg->callback() immediately after we decide do datacopy
even if caller want to do zerocopy. This will cause the callback
(vhost_net_zerocopy_callback) decrease the refcount. But when we meet
an error afterwards, the error handling in vhost handle_tx() will try
to decrease it again. This is wrong and fix this by delay the
uarg->callback() until we're sure there's no errors.
Reported-by: wangyunjian <wangyunjian@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>